Creating Lakehouse Mapping

9.2 Creating Lakehouse Mapping

For MySQL HeatWave Lakehouse to process data in Amazon S3, it needs to first map the data in files on S3 to a MySQL HeatWave Lakehouse table. Once the S3 data is provided, the schema for the Lakehouse table is inferred automatically by MySQL HeatWave Autopilot. After the data mapping is created, it allows the S3 data to be loaded into the Lakehouse table for querying by Lakehouse.

Note:

To map and load data into Lakehouse, the Amazon S3 bucket in which the data is located and the DB System must be in the same region.

This task requires the following:

A running DB system with an active MySQL HeatWave Cluster. See Creating a MySQL HeatWave Cluster.
Properly configured Lakehouse IAM role ARN. See Creating an IAM Role to Access an Amazon S3 Bucket.
S3 URIs for one or more files of similar data.

Do the following to create a Lakehouse data mapping:

Connect to a DB System that has a MySQL HeatWave Cluster with Lakehouse enabled. See Connecting from the Console.
Click the Data Management tab, and then click Create Lakehouse Mapping.
Source:

Note:
You can map more than one Amazon S3 bucket file to the destination schema and table, but all the files should be of the same format (i.e., they are all either CSV, Parquet, Avro, or JSON files). Use the Add new file button to create multiple entries for data sources,
- File path type: Select either of the following:
  - Prefix: Enables you to map all files present under an Amazon S3 bucket prefix to the destination schema and table.
  - Name: Enables you to map a specific file present in an Amazon S3 bucket to the destination schema and table.
  - Pattern: Enables you to map a set of files present in an Amazon S3 bucket defined by a regular expression to the destination schema and table.
- File path: Depending on the File path type you select, specify one of the following
  - The Amazon S3 bucket prefix path, s3://<bucket-name>/<prefix>
  - The Amazon S3 bucket path with the file name, s3://<bucket-name>/<prefix>/<filename>
  - The Amazon S3 bucket path containing a regular expression, s3://<bucket-name>/<regex> (the pattern of the regular expression follows the modified ECMAScript regular expression grammar)
- Strict mode: Select Default, Enabled, or Disabled, which controls how Lakehouse handles invalid or missing values for data-changing statements such as INSERT or UPDATE when loading data into MySQL HeatWave. This parameter overrides the Strict mode setting that you specify in the File parsing settings for CSV files. See Strict SQL Mode. The following shows what happens if parsing errors, formatting errors, or empty columns are found in a file during data loading, for each possible value of Strict mode:
  - Default: The situation is handled according to the MySQL server's SQL mode setting.
  - Enabled: An error is displayed, interrupting the operation.
  - Disabled: A warning is displayed, without interrupting the operation. Empty columns are automatically filled with default column values or NULL.
File parsing settings:
- Format: Select the format of the file you want to map. You can select CSV, Parquet, Avro, or JSON. The default format is CSV. All the files that you want to map should have the same format.
  
  Note:
  See MySQL HeatWave Lakehouse Limitations for the limitations for mapping and loading files of each of the supported formats.
  
  If you select the CSV format, you can set the following:
  - Skip rows: Specify the number of rows that is skipped from the beginning of the data file (applicable to every file included in the file list). By default, no rows are skipped, and the maximum rows to skip is 20.
  - Field delimiter: Specify one or more characters to enclose fields. The maximum field delimiter length is 64 characters. The default delimiter is ,.
  - Record delimiter: Specify one or more characters to delimit records. The maximum record delimiter length is 64 characters. The default delimiter is \n.
  - Escape character: Specify one or more characters to escape special characters. The escape character is \.
  - Quotation marks: Specify one or more characters to enclose fields. The default character to enclose fields is ".
  - Encoding: Specify the character set to map data. The default character set is utf8mb4. See Character Sets and Collations in MySQL.
  - Date format: See Date_format. The default date format is auto.
  - Time format: See String and Numeric Literals in Date and Time Context. The default time format is auto.
  - Strict mode: Specify whether the mapping takes place during data mapping in strict mode (Enabled) or non-strict mode (Disabled). See Strict SQL Mode
    - Default: The situation is handled according to the MySQL server's SQL mode setting.
    - Enabled: If parsing errors, formatting errors, or empty columns are found in the file, an error is displayed, interrupting the operation.
    - Disabled: If parsing errors, formatting errors, or empty columns are found in the file, a warning is displayed, without interrupting the operation. Empty columns are automatically filled with default column values or NULL.
  - Has header: Specify whether the file has a header.
    - Default: Default value is Disabled
    - Enabled: Treat the first row of the file as the header
    - Disabled: Treat the file as if it does not have a header.
  - Trim spaces: Select whether to remove leading and trailing spaces.
    - Enabled: Remove leading and trailing spaces.
    - Disabled: Do not remove leading and trailing spaces.
Destination: Specify the destination of the Lakehouse table into which the data is to be loaded.
- Schema: Name of the destination schema.
- Table: Name of the destination table.
Click Next.
Under Autopilot schema inference:
- Review the Definitions, Errors, and Warnings, and fix any reported issues.
- Click the Refresh button to refresh the step with the latest MySQL HeatWave schema and table information, to make sure the mapping remains valid after it was first generated.
Under MySQL HeatWave load:
- Note the Estimated memory required
- The Load into MySQL HeatWave after creating Lakehouse mapping option is selected by default if there are no Errors or Warnings; deselect if you want.
Do one of the following:
- Click Create Lakehouse Mapping to create the destination Lakehouse table (which remains empty after its creation, until data is loaded).
- Click the Back button to go back and change your mapping settings.
- Click the Cancel button to cancel the mapping.
- Click the Copy button to copy the DDL statement for data mapping into your clipboard.

Note:

Mapping data onto an existing Lakehouse table is not supported. Mapping will fail if the specified Destination points to a Lakehouse table that already exists. If Destination points to an existing InnoDB table on the target MySQL server, the mapping operation will change the destination table on the server into a Lakehouse table if the table is empty, or the operation will fail if the table is not empty.

If you have not selected Load into MySQL HeatWave after creating Lakehouse mapping, once the Lakehouse table is successfully created, you can load the mapped data into your MySQL HeatWave Cluster by following the instructions in Loading or Unloading Data into MySQL HeatWave Cluster.

Related Topics

Refreshing Lakehouse Data

Parent topic: Manage Data in MySQL HeatWave