MySQL HeatWave User Guide
The tpch Analytics Quickstart shows how to import data into the DB System using the MySQL Shell Parallel Table Import Utility, manually loading data into MySQL HeatWave Cluster, and running queries.
This quickstart contains the following sections:
Review the MySQL HeatWave Quickstart Requirements.
MySQL Shell 8.0.22 or higher. We recommend that you always run the latest version of MySQL Shell, which is currently MySQL version 9.3.0. You can download the latest version from MySQL downloads.
Linux system with gcc
and
make
libraries installed, to generate
tpch
sample data using the
dbgen
utility.
Examples in this Quickstart use the tpch
sample database, which is an ad-hoc decision support database
derived from the TPC
Benchmark™ H (TPC-H) specification. For an overview of
the tpch
schema, refer to the
Logical Database Design section of the
specification
document.
The following instructions describe how to generate
tpch
sample data using the
dbgen
utility. The instructions assume you
are on a Linux system that has gcc
and
make
libraries installed.
To generate tpch
sample data:
Download the TPC-H tools zip file from TPC Download Current.
Extract the zip file to a local directory on your system.
Change to the dbgen
directory and make
a copy of the makefile template.
$>cd 2.18.0/dbgen
$>cp makefile.suite makefile
Configure the variables CC
,
DATABASE
,
MACHINE
and WORKLOAD
in the makefile, as follows:
CC = gcc DATABASE = ORACLE MACHINE = LINUX WORKLOAD = TPCH
Run make to build the
dbgen
utility:
$> make
Run the following dbgen
command to
generate a 1GB set of data files for the
tpch
database:
$> ./dbgen -s 1
The operation may take a few minutes. When finished, the
following data files appear in the working directory, one
for each table in the tpch
database:
$> ls -1 *.tbl
customer.tbl
lineitem.tbl
nation.tbl
orders.tbl
partsupp.tbl
part.tbl
region.tbl
supplier.tbl
This topic describes how to create the tpch sample database on the DB System and import the sample data. The sample data must be available on the DB System before it can be loaded into the MySQL HeatWave Cluster.
Sample database creation and import operations are performed using MySQL Shell. We recommend that you always run the latest version of MySQL Shell, which is currently MySQL version 9.3.0. You can download the latest version from MySQL downloads".
The MySQL Shell Parallel Table Import Utility provides fast
data import for large data files. The utility analyzes an input
data file, divides it into chunks, and uploads the chunks to the
target DB System using parallel connections. The utility is
capable of completing a large data import many times faster than
a standard single-threaded upload using a LOAD
DATA
statement. For additional information, see
Parallel Table Import Utility.
To create the tpch sample database on the DB System and import data:
Start MySQL Shell with --mysql
option
and connect to the DB System endpoint.
$> mysqlsh --mysql Username@DBSystem_IP_Address_or_Host_Name
The --mysql
option opens a
ClassicSession
, which is required when
using the MySQL Shell Parallel Table Import Utility.
MySQL Shell opens in SQL
execution
mode by default. If it opens in JavaScript
(JS)
mode, run the following command to switch
to SQL
execution mode.
mysql-js> \sql
Create the tpch
sample database and
tables:
mysql-sql>CREATE DATABASE tpch character set utf8mb4;
mysql-sql>USE tpch;
mysql-sql>CREATE TABLE nation ( N_NATIONKEY INTEGER primary key, N_NAME CHAR(25) NOT NULL, N_REGIONKEY INTEGER NOT NULL, N_COMMENT VARCHAR(152));
mysql-sql>CREATE TABLE region ( R_REGIONKEY INTEGER primary key, R_NAME CHAR(25) NOT NULL, R_COMMENT VARCHAR(152));
mysql-sql>CREATE TABLE part ( P_PARTKEY INTEGER primary key, P_NAME VARCHAR(55) NOT NULL, P_MFGR CHAR(25) NOT NULL, P_BRAND CHAR(10) NOT NULL, P_TYPE VARCHAR(25) NOT NULL, P_SIZE INTEGER NOT NULL, P_CONTAINER CHAR(10) NOT NULL, P_RETAILPRICE DECIMAL(15,2) NOT NULL, P_COMMENT VARCHAR(23) NOT NULL );
mysql-sql>CREATE TABLE supplier ( S_SUPPKEY INTEGER primary key, S_NAME CHAR(25) NOT NULL, S_ADDRESS VARCHAR(40) NOT NULL, S_NATIONKEY INTEGER NOT NULL, S_PHONE CHAR(15) NOT NULL, S_ACCTBAL DECIMAL(15,2) NOT NULL, S_COMMENT VARCHAR(101) NOT NULL);
mysql-sql>CREATE TABLE partsupp ( PS_PARTKEY INTEGER NOT NULL, PS_SUPPKEY INTEGER NOT NULL, PS_AVAILQTY INTEGER NOT NULL, PS_SUPPLYCOST DECIMAL(15,2) NOT NULL, PS_COMMENT VARCHAR(199) NOT NULL, primary key (ps_partkey, ps_suppkey) );
mysql-sql>CREATE TABLE customer ( C_CUSTKEY INTEGER primary key, C_NAME VARCHAR(25) NOT NULL, C_ADDRESS VARCHAR(40) NOT NULL, C_NATIONKEY INTEGER NOT NULL, C_PHONE CHAR(15) NOT NULL, C_ACCTBAL DECIMAL(15,2) NOT NULL, C_MKTSEGMENT CHAR(10) NOT NULL, C_COMMENT VARCHAR(117) NOT NULL);
mysql-sql>CREATE TABLE orders ( O_ORDERKEY INTEGER primary key, O_CUSTKEY INTEGER NOT NULL, O_ORDERSTATUS CHAR(1) NOT NULL, O_TOTALPRICE DECIMAL(15,2) NOT NULL, O_ORDERDATE DATE NOT NULL, O_ORDERPRIORITY CHAR(15) NOT NULL, O_CLERK CHAR(15) NOT NULL, O_SHIPPRIORITY INTEGER NOT NULL, O_COMMENT VARCHAR(79) NOT NULL);
mysql-sql>CREATE TABLE lineitem ( L_ORDERKEY INTEGER NOT NULL, L_PARTKEY INTEGER NOT NULL, L_SUPPKEY INTEGER NOT NULL, L_LINENUMBER INTEGER NOT NULL, L_QUANTITY DECIMAL(15,2) NOT NULL, L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL, L_DISCOUNT DECIMAL(15,2) NOT NULL, L_TAX DECIMAL(15,2) NOT NULL, L_RETURNFLAG CHAR(1) NOT NULL, L_LINESTATUS CHAR(1) NOT NULL, L_SHIPDATE DATE NOT NULL, L_COMMITDATE DATE NOT NULL, L_RECEIPTDATE DATE NOT NULL, L_SHIPINSTRUCT CHAR(25) NOT NULL, L_SHIPMODE CHAR(10) NOT NULL, L_COMMENT VARCHAR(44) NOT NULL, primary key(L_ORDERKEY,L_LINENUMBER));
Verify that the tpch
schema and tables
were created:
mysql-sql> SHOW TABLES;
+----------------+
| Tables_in_tpch |
+----------------+
| customer |
| lineitem |
| nation |
| orders |
| part |
| partsupp |
| region |
| supplier |
+----------------+
Switch to JavaScript execution mode to use the Parallel Table Import Utility:
mysql>sql> \js
Execute the following operations to import the data into the
tpch
database on the DB System.
For information about the
util.importTable()
options used in the
following commands, see
Parallel Table Import Utility.
The number of parallel threads specified using the
threads
option depends on the number of
CPU cores of the shape. It is assumed that sample data
fields are terminated by the pipe "|
"
character.
mysql-js>util.importTable("nation.tbl",{table: "nation", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16})
mysql-js>util.importTable("region.tbl",{table: "region", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16})
mysql-js>util.importTable("part.tbl",{table: "part", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16})
mysql-js>util.importTable("supplier.tbl",{table: "supplier", fieldsTerminatedBy:"|" , bytesPerChunk:"100M", threads:16})
mysql-js>util.importTable("partsupp.tbl",{table: "partsupp", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16})
mysql-js>util.importTable("customer.tbl",{table: "customer", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16})
mysql-js>util.importTable("orders.tbl",{table: "orders", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16})
mysql-js>util.importTable("lineitem.tbl",{table: "lineitem", fieldsTerminatedBy:"|", bytesPerChunk:"100M", threads:16})
To load the tpch
sample data into the
MySQL HeatWave Cluster:
For MySQL HeatWave on AWS, load data into MySQL HeatWave Cluster using the MySQL HeatWave Console. See Manage Data in MySQL HeatWave with Workspaces in the MySQL HeatWave on AWS Service Guide.
Start MySQL Shell and connect to the DB System endpoint:
$> mysqlsh Username@DBSystem_IP_Address_or_Host_Name
MySQL Shell opens in SQL
execution
mode by default. If it opens in JavaScript
(JS)
mode, run the following command to switch
to SQL
execution mode.
mysql-js> \sql
Execute the following operations to prepare the
tpch
sample database tables and load them
into the MySQL HeatWave Cluster. The operations performed include
defining string column encodings, defining the secondary
engine, and executing SECONDARY_LOAD
operations.
mysql-sql>USE tpch;
mysql-sql>ALTER TABLE nation modify `N_NAME` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE nation modify `N_COMMENT` VARCHAR(152) COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE nation SECONDARY_ENGINE=RAPID;
mysql-sql>ALTER TABLE nation SECONDARY_LOAD;
mysql-sql>ALTER TABLE region modify `R_NAME` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE region modify `R_COMMENT` VARCHAR(152) COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE region SECONDARY_ENGINE=RAPID;
mysql-sql>ALTER TABLE region SECONDARY_LOAD;
mysql-sql>ALTER TABLE part modify `P_MFGR` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE part modify `P_BRAND` CHAR(10) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE part modify `P_CONTAINER` CHAR(10) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE part modify `P_COMMENT` VARCHAR(23) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE part SECONDARY_ENGINE=RAPID;
mysql-sql>ALTER TABLE part SECONDARY_LOAD;
mysql-sql>ALTER TABLE supplier modify `S_NAME` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE supplier modify `S_ADDRESS` VARCHAR(40) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE supplier modify `S_PHONE` CHAR(15) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE supplier SECONDARY_ENGINE=RAPID;
mysql-sql>ALTER TABLE supplier SECONDARY_LOAD;
mysql-sql>ALTER TABLE partsupp modify `PS_COMMENT` VARCHAR(199) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE partsupp SECONDARY_ENGINE=RAPID;
mysql-sql>ALTER TABLE partsupp SECONDARY_LOAD;
mysql-sql>ALTER TABLE customer modify `C_NAME` VARCHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE customer modify `C_ADDRESS` VARCHAR(40) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE customer modify `C_MKTSEGMENT` CHAR(10) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE customer modify `C_COMMENT` VARCHAR(117) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE customer SECONDARY_ENGINE=RAPID;
mysql-sql>ALTER TABLE customer SECONDARY_LOAD;
mysql-sql>ALTER TABLE orders modify `O_ORDERSTATUS` CHAR(1) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE orders modify `O_ORDERPRIORITY` CHAR(15) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE orders modify `O_CLERK` CHAR(15) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE orders SECONDARY_ENGINE=RAPID;
mysql-sql>ALTER TABLE orders SECONDARY_LOAD;
mysql-sql>ALTER TABLE lineitem modify `L_RETURNFLAG` CHAR(1) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE lineitem modify `L_LINESTATUS` CHAR(1) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE lineitem modify `L_SHIPINSTRUCT` CHAR(25) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE lineitem modify `L_SHIPMODE` CHAR(10) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE lineitem modify `L_COMMENT` VARCHAR(44) NOT NULL COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
mysql-sql>ALTER TABLE lineitem SECONDARY_ENGINE=RAPID;
mysql-sql>ALTER TABLE lineitem SECONDARY_LOAD;
Verify that the tpch
sample database
tables are loaded in the MySQL HeatWave Cluster by querying
LOAD_STATUS
data from the MySQL HeatWave
Performance Schema tables. Loaded tables have an
AVAIL_RPDGSTABSTATE
load status.
mysql-sql>USE performance_schema;
mysql-sql>SELECT NAME, LOAD_STATUS FROM rpd_tables,rpd_table_id WHERE rpd_tables.ID = rpd_table_id.ID;
+------------------------------+---------------------+ | NAME | LOAD_STATUS | +------------------------------+---------------------+ | tpch.supplier | AVAIL_RPDGSTABSTATE | | tpch.partsupp | AVAIL_RPDGSTABSTATE | | tpch.orders | AVAIL_RPDGSTABSTATE | | tpch.lineitem | AVAIL_RPDGSTABSTATE | | tpch.customer | AVAIL_RPDGSTABSTATE | | tpch.nation | AVAIL_RPDGSTABSTATE | | tpch.region | AVAIL_RPDGSTABSTATE | | tpch.part | AVAIL_RPDGSTABSTATE | +------------------------------+---------------------+
This topic describes how to query tpch
data
in the MySQL HeatWave Cluster. After tables are loaded into the
MySQL HeatWave Cluster, queries that qualify are automatically offloaded to
the MySQL HeatWave Cluster for accelerated processing. To run queries:
For MySQL HeatWave on AWS, run queries from the Query Editor in the MySQL HeatWave Console. See Running Queries in the MySQL HeatWave on AWS Service Guide.
Start MySQL Shell and connect to the DB System endpoint:
$> mysqlsh Username@DBSystem_IP_Address_or_Host_Name
MySQL Shell opens in SQL
execution
mode by default. If it opens in JavaScript
(JS)
mode, run the following command to switch
to SQL
execution mode.
mysql-js> \sql
Change to the tpch database:
mysql-sql> USE tpch;
Default schema set to `tpch`.Fetching table and column names from `tpch` for
auto-completion... Press ^C to stop.
Before running a query, use EXPLAIN
to
verify that the query can be offloaded to the MySQL HeatWave Cluster.
For example:
mysql-sql> EXPLAIN SELECT SUM(l_extendedprice * l_discount) AS revenue
FROM lineitem
WHERE l_shipdate >= date '1994-01-01';
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: lineitem
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 56834662
filtered: 33.33
Extra: Using where; Using secondary engine RAPID
If the query can be offloaded, the Extra
column in the EXPLAIN
output reports
Using secondary engine RAPID
.
After verifying that the query can be offloaded, run the query and note the execution time.
mysql-sql> SELECT SUM(l_extendedprice * l_discount) AS revenue
FROM lineitem
WHERE l_shipdate >= date '1994-01-01';
+------------------+
| revenue |
+------------------+
| 82752894454.9036 |
+------------------+
1 row in set (0.04 sec)
To compare the MySQL HeatWave Cluster execution time with DB System
execution time, disable the
use_secondary_engine
variable to see how long it takes to run the same query on
the DB System. For example:
mysql-sql>SET SESSION use_secondary_engine=OFF;
mysql-sql>SELECT SUM(l_extendedprice * l_discount) AS revenue FROM lineitem WHERE l_shipdate >= date '1994-01-01';
+------------------+ | revenue | +------------------+ | 82752894454.9036 | +------------------+ 1 row in set (24.20 sec)
For other tpch
sample database queries
that you can run, see
Additional tpch Queries.
For more information about running queries, refer to
Section 5.4, “Run Queries”.
This topic provides additional tpch queries that you can run to test the MySQL HeatWave Cluster.
TPCH-Q1: Pricing Summary Report Query
As described in the TPC Benchmark™ H (TPC-H)
specification: "The Pricing Summary Report Query provides a
summary pricing report for all lineitems shipped as of a
given date. The date is within 60 - 120 days of the greatest
ship date contained in the database. The query lists totals
for extended price, discounted extended price, discounted
extended price plus tax, average quantity, average extended
price, and average discount. These aggregates are grouped by
RETURNFLAG
and
LINESTATUS
, and listed in ascending order
of RETURNFLAG
and
LINESTATUS
. A count of the number of
lineitems in each group is included."
mysql-sql> SELECT
l_returnflag,
l_linestatus,
SUM(l_quantity) AS sum_qty,
SUM(l_extendedprice) AS sum_base_price,
SUM(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
AVG(l_quantity) AS avg_qty,
AVG(l_extendedprice) AS avg_price,
AVG(l_discount) AS avg_disc,
COUNT(*) AS count_order
FROM
lineitem
WHERE
l_shipdate <= DATE '1998-12-01' - INTERVAL '90' DAY
GROUP BY l_returnflag , l_linestatus
ORDER BY l_returnflag , l_linestatus;
TPCH-Q3: Shipping Priority Query
As described in the TPC Benchmark™ H (TPC-H)
specification: "The Shipping Priority Query retrieves the
shipping priority and potential revenue, defined as the sum
of l_extendedprice * (1-l_discount)
, of
the orders having the largest revenue among those that had
not been shipped as of a given date. Orders are listed in
decreasing order of revenue. If more than 10 unshipped
orders exist, only the 10 orders with the largest revenue
are listed."
mysql-sql> SELECT
l_orderkey,
SUM(l_extendedprice * (1 - l_discount)) AS revenue,
o_orderdate,
o_shippriority
FROM
customer,
orders,
lineitem
WHERE
c_mktsegment = 'BUILDING'
AND c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND o_orderdate < DATE '1995-03-15'
AND l_shipdate > DATE '1995-03-15'
GROUP BY l_orderkey , o_orderdate , o_shippriority
ORDER BY revenue DESC , o_orderdate
LIMIT 10;
TPCH-Q9: Product Type Profit Measure Query
As described in the TPC Benchmark™ H (TPC-H)
specification: "The Product Type Profit Measure Query finds,
for each nation and each year, the profit for all parts
ordered in that year that contain a specified substring in
their names and that were filled by a supplier in that
nation. The profit is defined as the sum of
[(l_extendedprice*(1-l_discount)) - (ps_supplycost
* l_quantity)]
for all lineitems describing parts
in the specified line. The query lists the nations in
ascending alphabetical order and, for each nation, the year
and profit in descending order by year (most recent first).
"
mysql-sql> SELECT
nation, o_year, SUM(amount) AS sum_profit
FROM
(SELECT
n_name AS nation,
YEAR(o_ORDERdate) AS o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amount
FROM
part
STRAIGHT_JOIN partsupp
STRAIGHT_JOIN lineitem
STRAIGHT_JOIN supplier
STRAIGHT_JOIN orders
STRAIGHT_JOIN nation
WHERE
s_suppkey = l_suppkey
AND ps_suppkey = l_suppkey
AND ps_partkey = l_partkey
AND p_partkey = l_partkey
AND o_ORDERkey = l_ORDERkey
AND s_nationkey = n_nationkey
AND p_name LIKE '%green%') AS profit
GROUP BY nation , o_year
ORDER BY nation , o_year DESC;
tpch
tables can be unloaded from MySQL HeatWave Cluster
using the following statements:
For MySQL HeatWave on AWS, unload data from MySQL HeatWave Cluster using the MySQL HeatWave Console. See Manage Data in MySQL HeatWave with Workspaces in the MySQL HeatWave on AWS Service Guide.
mysql-sql>USE tpch;
mysql-sql>ALTER TABLE customer SECONDARY_UNLOAD;
mysql-sql>ALTER TABLE lineitem SECONDARY_UNLOAD;
mysql-sql>ALTER TABLE nation SECONDARY_UNLOAD;
mysql-sql>ALTER TABLE orders SECONDARY_UNLOAD;
mysql-sql>ALTER TABLE part SECONDARY_UNLOAD;
mysql-sql>ALTER TABLE partsupp SECONDARY_UNLOAD;
mysql-sql>ALTER TABLE region SECONDARY_UNLOAD;
mysql-sql>ALTER TABLE supplier SECONDARY_UNLOAD;