跳到主要内容

ZIPF

声明

ZIPF(<s> , <N> , <gen>)

说明

该函数返回一个符合齐夫分布(Zipf-distributed)的整数,取值范围为 [0, N),分区的特征指数为 s

  • s 为特征指数,s 越大生成的序列越倾斜。将序列绘制成曲线时,曲线越陡峭。
  • sN 取值要求:必须是一个标量值,不随行迭代而变。例如,整形或浮点型常量、标量函数等,PL 里还可以是 @v11+@v3 等。s 的取值范围为 [1, +∞),N 的取值范围为 [1,16777215]。
  • zipf 算法实现时消耗的存储、计算资源和 N 的取值相关。算法空间复杂度为 O(N),每生成一个整数的时间复杂度为 O(logN)。所以,N 的取值范围被限制为 [1, 16777215]。
  • gen 是一个数值生成函数,通常使用 RANDOM() 函数生成。如果传入值是一个常量,则 zipf() 函数返回值也为一个定值。

示例

如下示例为使用 ZIPF() 函数返回符合齐夫分布的整数。

SELECT ZIPF(1, 10, RANDOM()) FROM TABLE(GENERATOR(6));
+-----------------------+
| ZIPF(1, 10, RANDOM()) |
+-----------------------+
| 4 |
| 5 |
| 2 |
| 1 |
| 0 |
| 2 |
+-----------------------+
6 row in set (0.002 sec)

SELECT ZIPF(1, 10, 0415) FROM TABLE(GENERATOR(6));
+-------------------+
| ZIPF(1, 10, 0415) |
+-------------------+
| 8 |
| 8 |
| 8 |
| 8 |
| 8 |
| 8 |
+-------------------+
6 row in set (0.002 sec)

SELECT ZIPF(ABS(-1), 23, RANDOM()) FROM DUAL;
+-----------------------------+
| ZIPF(ABS(-1), 23, RANDOM()) |
+-----------------------------+
| 1 |
+-----------------------------+
1 row in set (0.001 sec)

ZIPF() 函数的 s 取值对分布存在影响,示例如下。

SELECT  COUNT(*), ZIPF(1, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 350 | 9 |
| 369 | 8 |
| 450 | 7 |
| 488 | 6 |
| 559 | 5 |
| 727 | 4 |
| 877 | 3 |
| 1100 | 2 |
| 1755 | 1 |
| 3325 | 0 |
+----------+------+
10 row in set (0.003 sec)

SELECT COUNT(*), ZIPF(2, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 69 | 9 |
| 73 | 8 |
| 102 | 7 |
| 118 | 6 |
| 187 | 5 |
| 260 | 4 |
| 419 | 3 |
| 679 | 2 |
| 1632 | 1 |
| 6461 | 0 |
+----------+------+
10 row in set (0.003 sec)

SELECT COUNT(*), ZIPF(3, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 6 | 9 |
| 12 | 8 |
| 15 | 7 |
| 35 | 5 |
| 40 | 6 |
| 77 | 4 |
| 118 | 3 |
| 292 | 2 |
| 1106 | 1 |
| 8299 | 0 |
+----------+------+
10 row in set (0.003 sec)

SELECT COUNT(*), ZIPF(4, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 1 | 8 |
| 1 | 7 |
| 1 | 9 |
| 2 | 5 |
| 6 | 6 |
| 19 | 4 |
| 35 | 3 |
| 120 | 2 |
| 548 | 1 |
| 9267 | 0 |
+----------+------+
10 row in set (0.003 sec)

SELECT COUNT(*), ZIPF(5, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 2 | 5 |
| 3 | 4 |
| 18 | 3 |
| 31 | 2 |
| 309 | 1 |
| 9637 | 0 |
+----------+------+
6 row in set (0.003 sec)