Version: V1.0.0

TPC-H benchmark test for seekdb

This topic describes how to run a TPC-H benchmark test against seekdb by using the official TPC-H tools.

What is TPC-H

TPC-H is a decision support benchmark published by the Transaction Processing Performance Council (TPC). It simulates analytical workloads that are common in data warehousing and business intelligence, and it is widely used to evaluate query performance.

TPC-H evolved from TPC-D (1994). It models a data warehouse in Third Normal Form (3NF) and includes eight base tables. The primary metric is per-query response time, and the standard aggregate metric is QphH@size (queries per hour at a given scale factor). Because the workload is modeled after real-world production patterns, it can expose performance characteristics that simpler synthetic benchmarks may miss.

Prerequisites

Prepare the following in your test environment:

JDK: JDK 1.8u131 or later.
make: Install with yum install make.
GCC: Install with yum install gcc.
mysql-devel: Install with yum install mysql-devel.
Python MySQL driver: Install with sudo yum install MySQL-python.
prettytable: Install with pip install prettytable.
JDBC driver: Use mysql-connector-java-5.1.47.
TPC-H tools: Download from the TPC-H Tools download page.
seekdb: Use seekdb in client/server mode. For deployment instructions, see Deploy seekdb by using yum install.
Disk setup: Use three separate PL1-level disks for the log disk, clog disk, and data disk. Configure /etc/oceanbase/seekdb.cnf as follows:
```
port=2881
base-dir=/data/1/seekdb
data-dir=/data/2/seekdb
redo-dir=/data/3/seekdb
```

Test setup

Machines: Use two machines. Deploy the TPC-H tools on one machine, and deploy seekdb on the other.
seekdb host spec: 4 vCPUs and 8 GB memory. The host uses three PL1-level disks (log/clog/data) as described above.
Data size: 10 GB (scale factor 10).

Run TPC-H manually with the TPC-H tools

Step 1: Connect to seekdb

mysql -hxx.xx.xx.xx -P2881 -uroot -p**** -A

Step 2: Tune the environment

Before running the benchmark, apply the recommended settings. Connect to seekdb and run:

ALTER SYSTEM flush plan cache GLOBAL;
ALTER system SET enable_sql_audit=false;
select sleep(5);
ALTER system SET enable_perf_event=false;
ALTER system SET syslog_level='PERF';
alter system set enable_record_trace_log=false;
alter system set data_storage_warning_tolerance_time = '300s';
alter system set _data_storage_io_timeout = '600s';
alter system set trace_log_slow_query_watermark = '7d';
alter system set large_query_threshold='0ms';
alter system set max_syslog_file_count = 300;
alter system set spill_compression_codec = 'lz4';

SET global NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS';
SET global NLS_TIMESTAMP_FORMAT='YYYY-MM-DD HH24:MI:SS.FF';
SET global NLS_TIMESTAMP_TZ_FORMAT='YYYY-MM-DD HH24:MI:SS.FF TZR TZD';
set global ob_query_timeout=10800000000;
set global ob_trx_timeout=10000000000;

set global ob_sql_work_area_percentage=50;
alter system set default_table_store_format = 'column' ;
alter system set ob_enable_batched_multi_statement='true';
alter system set _io_read_batch_size = '128k';
alter system set _io_read_redundant_limit_percentage = 50;
SET global parallel_degree_policy = AUTO;

set global parallel_servers_target=10000;
set global collation_connection = utf8mb4_bin;
set global collation_database = utf8mb4_bin;
set global collation_server = utf8mb4_bin;

set global autocommit=1;
alter system set _nested_loop_join_enabled = false;
alter system set ob_enable_batched_multi_statement='true';

Step 3: Install the TPC-H tools

Download the TPC-H tools from the TPC-H Tools download page.

Unzip the package and enter the directory:

unzip 7e965ead-8844-4efa-a275-34e35f8ab89b-tpc-h-tool.zip
cd TPC-H\ V3.0.1/

In dbgen/, copy makefile.suite to Makefile:
```
cd dbgen/
cp makefile.suite Makefile
```

Edit Makefile and set CC, DATABASE, MACHINE, and WORKLOAD:

vim Makefile

CC      = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
#                                  SQLSERVER, SYBASE, ORACLE, VECTORWISE
# Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,
#                                  SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are:  TPCH
DATABASE= MYSQL
MACHINE = LINUX
WORKLOAD = TPCH

Edit tpcd.h and add the following macros:

vim tpcd.h

#ifdef MYSQL
#define GEN_QUERY_PLAN ""
#define START_TRAN "START TRANSACTION"
#define END_TRAN "COMMIT"
#define SET_OUTPUT ""
#define SET_ROWCOUNT "limit %d;\n"
#define SET_DBASE "use %s;\n"
#endif

Compile:

make

The return result is as follows:

gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o build.o build.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o driver.o driver.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o bm_utils.o bm_utils.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o rnd.o rnd.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o print.o print.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o load_stub.o load_stub.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o bcd2.o bcd2.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o speed_seed.o speed_seed.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o text.o text.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o permute.o permute.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o rng64.o rng64.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64  -O -o dbgen build.o driver.o bm_utils.o rnd.o print.o load_stub.o bcd2.o speed_seed.o text.o permute.o rng64.o -lm
gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o qgen.o qgen.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64    -c -o varsub.o varsub.c
gcc -g -DDBNAME=\"dss\" -DLINUX -DMYSQL -DTPCH -DRNG_TEST -D_FILE_OFFSET_BITS=64  -O -o qgen build.o bm_utils.o qgen.o rnd.o varsub.o text.o bcd2.o permute.o speed_seed.o rng64.o -lm

If the compilation succeeds, dbgen (data generator), qgen (query generator), and dists.dss are generated.

Step 4: Generate data

Generate 10 GB of data (scale factor 10):

./dbgen -s 10
mkdir tpch10
mv *.tbl tpch10

Step 5: Generate query SQL

info

You can generate query SQL by following the steps in this section and then adjust it as needed. Alternatively, you can use the query SQL provided on GitHub.

If you use the GitHub SQL, update cpu_num in the SQL statements to a concurrency of 4.

Generate query SQL with the built-in TPC-H tools:

Copy qgen, dists.dss, and queries into the mysql_sql directory.

mkdir mysql_sql
cp qgen mysql_sql/
cp dists.dss mysql_sql/
cp queries/*.sql mysql_sql/

In mysql_sql/, create a script named gen.sh to generate the query SQL files:

vim gen.sh

#!/usr/bin/bash
for i in {1..22}
do
./qgen -d $i -s 10 > db"$i".sql
done

Run gen.sh:
```
sh gen.sh
```
Adjust query SQL:
```
dos2unix *
```
View adjusted query SQL on GitHub.

Update the query concurrency.

For example, for query Q1, add a parallel() hint:

SELECT /*+    parallel(4) */   --- Add the parallel hint for concurrent execution.
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
FROM
lineitem
WHERE
l_shipdate <= date '1998-12-01' - interval '90' day
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus;

Step 6: Create tables

Create a table definition file named create_tpch_mysql_table_part.ddl and include the following content:

DROP TABLE IF EXISTS LINEITEM;
CREATE TABLE lineitem (
   l_orderkey int(11) NOT NULL,
   l_partkey int(11) NOT NULL,
   l_suppkey int(11) NOT NULL,
   l_linenumber int(11) NOT NULL,
   l_quantity decimal(15,2) NOT NULL,
   l_extendedprice decimal(15,2) NOT NULL,
   l_discount decimal(15,2) NOT NULL,
   l_tax decimal(15,2) NOT NULL,
   l_returnflag char(1) DEFAULT NULL,
   l_linestatus char(1) DEFAULT NULL,
   l_shipdate date NOT NULL,
   l_commitdate date DEFAULT NULL,
   l_receiptdate date DEFAULT NULL,
   l_shipinstruct varchar(25) DEFAULT NULL,
   l_shipmode varchar(10) DEFAULT NULL,
   l_comment varchar(44) DEFAULT NULL,
primary key(l_shipdate, l_orderkey, l_linenumber)
)row_format = condensed
partition by key (l_orderkey) partitions 32 with column group(each column);
alter table lineitem CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

DROP TABLE IF EXISTS ORDERS;
CREATE TABLE orders (
   o_orderkey int(11) NOT NULL,
   o_custkey int(11) NOT NULL,
   o_orderstatus varchar(1) DEFAULT NULL,
   o_totalprice decimal(15,2) DEFAULT NULL,
   o_orderdate date NOT NULL,
   o_orderpriority varchar(15) DEFAULT NULL,
   o_clerk varchar(15) DEFAULT NULL,
   o_shippriority int(11) DEFAULT NULL,
   o_comment varchar(79) DEFAULT NULL,
PRIMARY KEY (o_orderkey, o_orderdate)
) row_format = condensed
partition by key(o_orderkey) partitions 32 with column group(each column);
alter table orders CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

DROP TABLE IF EXISTS PARTSUPP;
CREATE TABLE partsupp (
   ps_partkey int(11) NOT NULL,
   ps_suppkey int(11) NOT NULL,
   ps_availqty int(11) DEFAULT NULL,
   ps_supplycost decimal(15,2) DEFAULT NULL,
   ps_comment varchar(199) DEFAULT NULL,
   PRIMARY KEY (ps_partkey, ps_suppkey)) row_format = condensed
partition by key(ps_partkey) partitions 32 with column group(each column);
alter table partsupp CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

DROP TABLE IF EXISTS PART;
CREATE TABLE part (
p_partkey int(11) NOT NULL,
p_name varchar(55) DEFAULT NULL,
p_mfgr varchar(25) DEFAULT NULL,
p_brand varchar(10) DEFAULT NULL,
p_type varchar(25) DEFAULT NULL,
p_size int(11) DEFAULT NULL,
p_container varchar(10) DEFAULT NULL,
p_retailprice decimal(12,2) DEFAULT NULL,
p_comment varchar(23) DEFAULT NULL,
PRIMARY KEY (p_partkey)) row_format = condensed
partition by key(p_partkey) partitions 32 with column group(each column);
alter table part CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

DROP TABLE IF EXISTS CUSTOMER;
CREATE TABLE customer (
c_custkey int(11) NOT NULL,
c_name varchar(25) DEFAULT NULL,
c_address varchar(40) DEFAULT NULL,
c_nationkey int(11) DEFAULT NULL,
c_phone varchar(15) DEFAULT NULL,
c_acctbal decimal(15,2) DEFAULT NULL,
c_mktsegment char(10) DEFAULT NULL,
c_comment varchar(117) DEFAULT NULL,
PRIMARY KEY (c_custkey)) row_format = condensed
partition by key(c_custkey) partitions 32 with column group(each column);
alter table customer CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

DROP TABLE IF EXISTS SUPPLIER;
CREATE TABLE supplier (
s_suppkey int(11) NOT NULL,
s_name varchar(25) DEFAULT NULL,
s_address varchar(40) DEFAULT NULL,
s_nationkey int(11) DEFAULT NULL,
s_phone varchar(15) DEFAULT NULL,
s_acctbal decimal(15,2) DEFAULT NULL,
s_comment varchar(101) DEFAULT NULL,
PRIMARY KEY (s_suppkey)
) row_format = condensed partition by key(s_suppkey) partitions 32 with column group(each column);
alter table supplier CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

DROP TABLE IF EXISTS NATION;
CREATE TABLE nation (
n_nationkey int(11) NOT NULL,
n_name varchar(25) DEFAULT NULL,
n_regionkey int(11) DEFAULT NULL,
n_comment varchar(152) DEFAULT NULL,
PRIMARY KEY (n_nationkey)
) row_format = condensed with column group(each column);
alter table nation CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

DROP TABLE IF EXISTS REGION;
CREATE TABLE region (
r_regionkey int(11) NOT NULL,
r_name varchar(25) DEFAULT NULL,
r_comment varchar(152) DEFAULT NULL,
PRIMARY KEY (r_regionkey)
) row_format = condensed with column group(each column);
alter table region CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

CREATE VIEW revenue0 AS 
SELECT l_suppkey as supplier_no,
         SUM(l_extendedprice * ( 1 - l_discount )) as total_revenue
            FROM   lineitem
            WHERE  l_shipdate >= DATE '1996-01-01'
                     AND l_shipdate < DATE '1996-04-01'
            GROUP  BY l_suppkey;

Step 7: Load data

You can write your own load script based on the generated data files and SQL. The following example shows one approach.

Create a directory for load scripts:

mkdir load
cd load
cp xx/create_tpch_mysql_table_part.ddl xx/tpch10/

Create a data load script named load.py:

#!/usr/bin/env python
#-*- encoding:utf-8 -*-
import os
import sys
import time
import commands
hostname='$host_ip'  # Note: IP address of the seekdb host.
port='$host_port'               # seekdb port
user='$user'               # Username
password='$password'           # Password
data_path='$path'         # Note: directory on the seekdb host where the .tbl files are located.
db_name='$db_name'             # Database name
# Create tables
cmd_str='mysql -h%s -P%s -u%s -p%s -D%s < create_tpch_mysql_table_part.ddl'%(hostname,port,user,password,db_name)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str='mysql -h%s -P%s -u%s -p%s  -D%s -e "show tables;" '%(hostname,port,user,password,db_name)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str=""" mysql -h%s -P%s -u%s -p%s -c  -D%s -e "load data /*+ parallel(16) */ infile '%s/customer.tbl' into table customer fields terminated by '|';" """ %(hostname,port,user,password,db_name,data_path)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str=""" mysql -h%s -P%s -u%s -p%s -c  -D%s -e "load data /*+ parallel(16) */ infile '%s/lineitem.tbl' into table lineitem fields terminated by '|';" """ %(hostname,port,user,password,db_name,data_path)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str=""" mysql -h%s -P%s -u%s -p%s -c -D%s -e "load data /*+ parallel(16) */ infile '%s/nation.tbl' into table nation fields terminated by '|';" """ %(hostname,port,user,password,db_name,data_path)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str=""" mysql -h%s -P%s -u%s -p%s -c  -D%s -e "load data /*+ parallel(16) */ infile '%s/orders.tbl' into table orders fields terminated by '|';" """ %(hostname,port,user,password,db_name,data_path)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str=""" mysql -h%s -P%s -u%s -p%s   -D%s -e "load data /*+ parallel(16) */ infile '%s/partsupp.tbl' into table partsupp fields terminated by '|';" """ %(hostname,port,user,password,db_name,data_path)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str=""" mysql -h%s -P%s -u%s -p%s -c  -D%s -e "load data /*+ parallel(16) */ infile '%s/part.tbl' into table part fields terminated by '|';" """ %(hostname,port,user,password,db_name,data_path)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str=""" mysql -h%s -P%s -u%s -p%s -c  -D%s -e "load data /*+ parallel(16) */ infile '%s/region.tbl' into table region fields terminated by '|';" """ %(hostname,port,user,password,db_name,data_path)
result = commands.getstatusoutput(cmd_str)
print result
cmd_str=""" mysql -h%s -P%s -u%s -p%s -c  -D%s -e "load data /*+ parallel(16) */ infile '%s/supplier.tbl' into table supplier fields terminated by '|';" """ %(hostname,port,user,password,db_name,data_path)
result = commands.getstatusoutput(cmd_str)
print result

Run the load script:

python load.py

Example output:

(0, '')
(0, 'Tables_in_test\ncustomer\nlineitem\nnation\norders\npart\npartsupp\nregion\nrevenue0\nsupplier')
(0, '')
(0, '')
(0, '')
(0, '')
(0, '')
(0, '')
(0, '')
(0, '')

After the data is loaded, run a major compaction and collect statistics.

Run a major compaction:
```
ALTER SYSTEM MAJOR FREEZE;
```
Check whether the major compaction is complete:
```
SELECT FROZEN_SCN, LAST_SCN
FROM  oceanbase.CDB_OB_MAJOR_COMPACTION;
```
info
The compaction is complete when all FROZEN_SCN values are equal to their corresponding LAST_SCN values.

Collect statistics.

Create a file named analyze_table.sql:

call dbms_stats.gather_table_stats(NULL, 'part', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128');
call dbms_stats.gather_table_stats(NULL, 'lineitem', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128');
call dbms_stats.gather_table_stats(NULL, 'customer', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128');
call dbms_stats.gather_table_stats(NULL, 'orders', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128');
call dbms_stats.gather_table_stats(NULL, 'partsupp', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128');
call dbms_stats.gather_table_stats(NULL, 'supplier', degree=>128, granularity=>'AUTO', method_opt=>'FOR ALL COLUMNS SIZE 128');

Then run:

source analyze_table.sql 

Step 8: Run the benchmark

You can write a benchmark script based on your generated data and SQL files. The following example shows one approach.

Create a script named tpch.sh:

#!/bin/bash
TPCH_TEST="mysql -h $host_ip -P $host_port -uroot  -Dtest -p $password -c"
# Warm-up runs
for i in {1..22}
do
      sql1="source db${i}.sql"
      echo $sql1| $TPCH_TEST >db${i}.log  || ret=1
done
# Benchmark run
for i in {1..22}
do
      starttime=`date +%s%N`
      echo `date  '+[%Y-%m-%d %H:%M:%S]'` "BEGIN Q${i}"
      sql1="source db${i}.sql"
      echo $sql1| $TPCH_TEST >db${i}.log  || ret=1
      stoptime=`date +%s%N`
      costtime=`echo $stoptime $starttime | awk '{printf "%0.2f\n", ($1 - $2) / 1000000000}'`
      echo `date  '+[%Y-%m-%d %H:%M:%S]'` "END,COST ${costtime}s"
done

Run the script:
```
sh tpch.sh
```

Benchmark results

For an example of test results, see seekdb TPC-H test report.

FAQ

Import fails with File not exist:
```
ERROR 1017 (HY000) at line 1: File not exist
```
The .tbl files must be located on the seekdb host. This is because the data load is performed locally on the server.

Import fails with Access denied:

ERROR 1227 (42501) at line 1: Access denied

Grant the required privileges, for example:

grant file on *.* to tpch_100g_part;

What is TPC-H​

Prerequisites​

Test setup​

Run TPC-H manually with the TPC-H tools​

Step 1: Connect to seekdb​

Step 2: Tune the environment​

Step 3: Install the TPC-H tools​

Step 4: Generate data​

Step 5: Generate query SQL​

Step 6: Create tables​

Step 7: Load data​

Step 8: Run the benchmark​

Benchmark results​

FAQ​

Contents