MySQL Shell 9.4
      MySQL Shell's parallel table import utility
      util.importTable() provides rapid data import
      to a MySQL relational table for large data files. The utility
      analyzes an input data file, distributes it into chunks, and
      uploads the chunks to the target MySQL server using parallel
      connections. The utility is capable of completing a large data
      import many times faster than a standard single-threaded upload
      using a LOAD DATA statement.
    
MySQL Shell's parallel table import utility supports the output from MySQL Shell's table export utility, which can compress the data file it produces as output, and can export it to a local folder or an Object Storage bucket. The default dialect for the parallel table import utility is the default for the output file produced by the table export utility. The parallel table import utility can also be used to upload files from other sources.
The data file or files to be imported can be in any of the following locations:
A location that is accessible to the client host as a local disk.
A remote location that is accessible to the client host through HTTP or HTTPS, specified with a URL. Pattern matching is not supported for files accessed in this way.
An Oracle Cloud Infrastructure Object Storage bucket.
The data is imported to a single relational table in the MySQL server to which the active MySQL session is connected.
        When you run the parallel table import utility, you specify the
        mapping between the fields in the data file or files, and the
        columns in the MySQL table. You can set field- and line-handling
        options as for the LOAD DATA
        statement to handle data files in arbitrary formats. For
        multiple files, all the files must be in the same format. The
        default dialect for the utility maps to a file created using a
        SELECT...INTO
        OUTFILE statement with the default settings for that
        statement. The utility also has preset dialects that map to the
        standard data formats for CSV files (created on DOS or UNIX
        systems), TSV files, and JSON, and you can customize these using
        the field- and line-handling options as necessary. Note that
        JSON data must be in document-per-line format.
      
A number of functions have been added to the parallel table import utility since it was introduced, so use the most recent version of MySQL Shell to get the utility's full functionality.
              The parallel table import utility can capture columns from
              the data file or files for input preprocessing, in the
              same way as with a LOAD
              DATA statement. The selected data can be
              discarded, or you can transform the data and assign it to
              a column in the target table.
            
              The data must be imported from a location that is
              accessible to the client host as a local disk or the data
              can be imported from an Oracle Cloud Infrastructure Object Storage bucket,
              specified by the osBucketName option.
            
The parallel table import utility can import a single input data file to a single relational table and is also capable of importing a specified list of files, and it supports wildcard pattern matching to include all relevant files from a location. Multiple files uploaded by a single run of the utility are placed into a single relational table, so for example, data that has been exported from multiple hosts could be merged into a single table to be used for analytics.
              The parallel table import utility can accept an
              uncompressed input data file. The utility analyzes the
              data file, distributes it into chunks, and uploads the
              chunks to the relational table in the target MySQL server,
              dividing the chunks up between the parallel connections.
              The utility can also accept data files compressed in the
              gzip (.gz) and
              zstd (.zst) formats,
              detecting the format automatically based on the file
              extension. The utility uploads a compressed file from
              storage in the compressed format, saving bandwidth for
              that part of the transfer. Compressed files cannot be
              distributed into chunks, so instead the utility uses its
              parallel connections to decompress and upload multiple
              files simultaneously to the target server. If there is
              only one input data file, the upload of a compressed file
              can only use a single connection.
            
        MySQL Shell's dump loading utility
        util.loadDump() is designed to import the
        combination of chunked output files and metadata produced by
        MySQL Shell's instance dump utility
        util.dumpInstance(), schema dump utility
        util.dumpSchemas(), and table dump utility
        util.dumpTables(). The parallel table import
        utility can be used in combination with the dump loading utility
        if you want to modify any of the data in the chunked output
        files before uploading it to the target server. To do this,
        first use the dump loading utility to load only the DDL for the
        selected table, to create the table on the target server. Then
        use the parallel table import utility to capture and transform
        data from the output files for the table, and import it to the
        target table. Repeat that process as necessary for any other
        tables where you want to modify the data. Finally, use the dump
        loading utility to load the DDL and data for any remaining
        tables that you do not want to modify, excluding the tables that
        you did modify. For a description of the procedure, see
        Modifying Dumped Data.
      
The parallel table import utility requires a classic connection to the target server. It does not currently support X Protocol connections.
        The parallel table import utility uses LOAD DATA LOCAL
        INFILE statements to upload data, so the
        local_infile system variable
        must be set to ON on the target server. You
        can do this by issuing the following statement in SQL mode
        before running the parallel table import utility:
      
SET GLOBAL local_infile = 1;
        To avoid a known potential security issue with LOAD
        DATA LOCAL, when the MySQL server replies to the
        parallel table import utility's LOAD DATA
        requests with file transfer requests, the utility only sends the
        predetermined data chunks, and ignores any specific requests
        attempted by the server. For more information, see
        Security Considerations for LOAD DATA LOCAL.
      
        The parallel table import utility requires an existing
        classic MySQL protocol connection to the target MySQL server. Each
        thread opens its own session to send chunks of the data to the
        MySQL server, or in the case of compressed files, to send
        multiple files in parallel. You can adjust the number of
        threads, number of bytes sent in each chunk, and maximum rate of
        data transfer per thread, to balance the load on the network and
        the speed of data transfer. The utility cannot operate over
        X Protocol connections, which do not support LOAD
        DATA statements.
      
        In the MySQL Shell API, the parallel table import utility is a
        function of the util global object, and has
        the following signature:
      
importTable ({file_name | file_list}, options)
        options is a dictionary of import options
        that can be omitted if it is empty. The options are listed in
        the final section of this topic.
      
        file_name is a string specifying the name and
        path for a single file containing the data to be imported.
        Alternatively, file_list is an array of file
        paths specifying multiple data files. On Windows, backslashes
        must be escaped in file paths, or you can use forward slashes
        instead.
      
            For files that are accessible to the client host on a local
            disk, you can prefix the directory path with the
            file:// schema, or allow it to default to
            that. For files accessed in this way, file paths can contain
            the wildcards * (multiple characters) and
            ? (single character) for pattern
            matching. Note that if these wildcard characters are present
            in file paths, the utility treats them as wildcards and
            might therefore attempt an incorrect strategy for file
            transfer.
          
            For files that are accessible to the client host through
            HTTP or HTTPS, provide a URL or a list of URLs, prefixed
            with the http:// or
            https:// schema as appropriate, in the
            format http[s]://host.domain[:port]/path.
            For files accessed in this way, pattern matching is not
            available. The HTTP server must support the Range request
            header, and must return the Content-Range response header to
            the client.
          
            For files in an Oracle Cloud Infrastructure Object Storage bucket, specify a path
            to the file in the bucket, and use the
            osBucketName option to specify the bucket
            name.
          
The function returns void, or an exception in case of an error. If the import is stopped partway by the user with Ctrl+C or by an error, the utility stops sending data. When the server finishes processing the data it received, messages are returned showing the chunk that was being imported by each thread at the time, the percentage complete, and the number of records that were updated in the target table.
        The following examples, the first in MySQL Shell's JavaScript
        mode and the second in MySQL Shell's Python mode, import the
        data in a single CSV file
        /tmp/productrange.csv to the
        products table in the mydb
        database, skipping a header row in the file:
      
mysql-js> util.importTable("/tmp/productrange.csv", {schema: "mydb", table: "products", dialect: "csv-unix", skipRows: 1, showProgress: true})
mysql-py> util.import_table("/tmp/productrange.csv", {"schema": "mydb", "table": "products", "dialect": "csv-unix", "skipRows": 1, "showProgress": True})
        The following example in MySQL Shell's Python mode only
        specifies the dialect for the CSV file. mydb
        is the active schema for the MySQL Shell session. The utility
        therefore imports the data in the file
        /tmp/productrange.csv to the
        productrange table in the
        mydb database:
      
mysql-py>\use mydbmysql-py>util.import_table("/tmp/productrange.csv", {"dialect": "csv-unix"})
The following example in MySQL Shell's Python mode imports the data from multiple files, including a mix of individually named files, ranges of files specified using wildcard pattern matching, and compressed files:
mysql-py> util.import_table(
    [
        "data_a.csv",
        "data_b*",
        "data_c*",
        "data_d.tsv.zst",
        "data_e.tsv.zst",
        "data_f.tsv.gz",
        "/backup/replica3/2021_01_12/data_g.tsv",
        "/backup/replica3/2021_01_13/*.tsv",
    ],
    {"schema": "mydb", "table": "productrange"}
) The parallel table import utility can also be invoked from the command line using the mysqlsh command interface. With this interface, you invoke the utility as in the following examples:
mysqlsh mysql://root:@127.0.0.1:3366 --ssl-mode=DISABLED -- util import-table /r/mytable.dump --schema=mydb --table=regions --bytes-per-chunk=10M --linesTerminatedBy=$'\r\n'
When you import multiple data files, ranges of files specified using wildcard pattern matching are expanded by MySQL Shell's glob pattern matching logic if they are quoted, as in the following example. Otherwise they are expanded by the pattern matching logic for the user shell where you entered the mysqlsh command.
mysqlsh mysql://root:@127.0.0.1:3366 -- util import-table data_a.csv "data_b*" data_d.tsv.zst --schema=mydb --table=productrange --osBucketName=mybucket
        Note that as shown in the above example, line feed characters
        must be passed using ANSI-C quoting in shells that support this
        function (such as bash,
        ksh, mksh, and
        zsh). For information on the
        mysqlsh command-line integration, see
        Section 5.8, “API Command Line Integration”.
      
The following import options are available for the parallel table import utility to specify how the data is imported:
schema:
            "db_name"
          
              The name of the target database on the connected MySQL
              server. If you omit this option, the utility attempts to
              identify and use the schema name in use for the current
              MySQL Shell session, as specified in a connection URI
              string, \use command, or MySQL Shell
              option. If the schema name is not specified and cannot be
              identified from the session, an error is returned.
            
table:
            "table_name"
          The name of the target relational table. If you omit this option, the utility assumes the table name is the name of the data file without the extension. The target table must exist in the target database.
columns: array of column
            names
          An array of strings containing column names from the import file or files, given in the order that they map to columns in the target relational table. Use this option if the imported data does not contain all the columns of the target table, or if the order of the fields in the imported data differs from the order of the columns in the table. If you omit this option, input lines are expected to contain a matching field for each column in the target table.
              You can use this option to capture columns from the import
              file or files for input preprocessing, in the same way as
              with a LOAD DATA statement.
              When you use an integer value in place of a column name in
              the array, that column in the import file or files is
              captured as a user variable
              @, for
              example int@1. The selected data can be
              discarded, or you can use the
              decodeColumns option to transform the
              data and assign it to a column in the target table.
            
              In this example in MySQL Shell's JavaScript mode, the
              second and fourth columns from the import file are
              assigned to the user variables @1 and
              @2, and no
              decodeColumns option is present to
              assign them to any column in the target table, so they are
              discarded.
            
mysql-js> util.importTable('file.txt', {
            table: 't1',
            columns: ['column1', 1, 'column2', 2, 'column3']
          });decodeColumns:
            dictionary
          
              A dictionary of key-value pairs that assigns import file
              columns captured as user variables by the
              columns option to columns in the target
              table, and specifies preprocessing transformations for
              them in the same way as the SET clause
              of a LOAD DATA statement.
            
              In this example in MySQL Shell's JavaScript mode, the
              first input column from the data file is used as the first
              column in the target table. The second input column, which
              has been assigned to the variable @1 by
              the columns option, is subjected to a
              division operation before being used as the value of the
              second column in the target table.
            
mysql-js> util.importTable('file.txt', {
            columns: ['column1', 1],
            decodeColumns: {'column2': '@1 / 100'}
          });In this example in MySQL Shell's JavaScript mode, the input columns from the data file are both assigned to variables, then transformed in various ways and used to populate the columns of the target table:
mysql-js> util.importTable('file.txt', {
            table: 't1',
            columns: [1, 2],
            decodeColumns: {
              'a': '@1',
              'b': '@2',
              'sum': '@1 + @2',
              'multiple': '@1 * @2',
              'power': 'POW(@1, @2)'
            }
          });skipRows:
            number
          Skip this number of rows at the beginning of the import file, or in the case of multiple import files, at the beginning of every file included in the file list. You can use this option to omit an initial header line containing column names from the upload to the table. The default is that no rows are skipped.
replaceDuplicates: [true|false]
          
              Whether input rows that have the same value for a primary
              key or unique index as an existing row should be replaced
              (true) or skipped
              (false). The default is
              false.
            
dialect: [default|csv|csv-unix|tsv|json]
          
              Use a set of field- and line-handling options appropriate
              for the specified file format. You can use the selected
              dialect as a base for further customization, by also
              specifying one or more of the
              linesTerminatedBy,
              fieldsTerminatedBy,
              fieldsEnclosedBy,
              fieldsOptionallyEnclosed, and
              fieldsEscapedBy options to change the
              settings. The default dialect maps to a file created using
              a
              SELECT...INTO
              OUTFILE statement with the default settings for
              that statement. This is the default for the output file
              produced by MySQL Shell's table export utility. Other
              dialects are available to suit CSV files (created on
              either DOS or UNIX systems), TSV files, and JSON data. The
              settings applied for each dialect are as follows:
            
Table 12.2 Dialect settings for parallel table import utility
| 
                       | 
                       | 
                       | 
                       | 
                       | 
                       | 
|---|---|---|---|---|---|
| 
                       | [LF] | [TAB] | [empty] | 
                       | \ | 
| 
                       | [CR][LF] | , | '' | 
                       | \ | 
| 
                       | [LF] | , | '' | 
                       | \ | 
| 
                       | [CR][LF] | [TAB] | '' | 
                       | \ | 
| 
                       | [LF] | [LF] | [empty] | 
                       | [empty] | 
The carriage return and line feed values for the dialects are operating system independent.
                    If you use the linesTerminatedBy,
                    fieldsTerminatedBy,
                    fieldsEnclosedBy,
                    fieldsOptionallyEnclosed, and
                    fieldsEscapedBy options,
                    depending on the escaping conventions of your
                    command interpreter, the backslash character (\)
                    might need to be doubled if you use it in the option
                    values.
                  
                    Like the MySQL server with the
                    LOAD DATA statement,
                    MySQL Shell does not validate the field- and
                    line-handling options that you specify. Inaccurate
                    selections for these options can cause data to be
                    imported into the wrong fields, partially, and/or
                    incorrectly. Always verify your settings before
                    starting the import, and verify the results
                    afterwards.
                  
linesTerminatedBy:
            "characters"
          
              One or more characters (or an empty string) that
              terminates each of the lines in the input data file or
              files. The default is as for the specified dialect, or a
              linefeed character (\n) if the dialect
              option is omitted. This option is equivalent to the
              LINES TERMINATED BY option for the
              LOAD DATA statement. Note
              that the utility does not provide an equivalent for the
              LINES STARTING BY option for the
              LOAD DATA statement, which is set to
              the empty string.
            
fieldsTerminatedBy:
            "characters"
          
              One or more characters (or an empty string) that
              terminates each of the fields in the input data file or
              files. The default is as for the specified dialect, or a
              tab character (\t) if the dialect
              option is omitted. This option is equivalent to the
              FIELDS TERMINATED BY option for the
              LOAD DATA statement.
            
fieldsEnclosedBy:
            "character"
          
              A single character (or an empty string) that encloses each
              of the fields in the input data file or files. The default
              is as for the specified dialect, or the empty string if
              the dialect option is omitted. This option is equivalent
              to the FIELDS ENCLOSED BY option for
              the LOAD DATA statement.
            
fieldsOptionallyEnclosed: [ true | false
            ]
          
              Whether the character given for
              fieldsEnclosedBy encloses all of the
              fields in the input data file or files
              (false), or encloses the fields only in
              some cases (true). The default is as
              for the specified dialect, or false if
              the dialect option is omitted. This option makes the
              fieldsEnclosedBy option equivalent to
              the FIELDS OPTIONALLY ENCLOSED BY
              option for the LOAD DATA
              statement.
            
fieldsEscapedBy:
            "character"
          
              The character that begins escape sequences in the input
              data file or files. If this is not provided, escape
              sequence interpretation does not occur. The default is as
              for the specified dialect, or a backslash (\) if the
              dialect option is omitted. This option is equivalent to
              the FIELDS ESCAPED BY option for the
              LOAD DATA statement.
            
characterSet:
            "charset"
          
              This option specifies a character set encoding with which
              the input data is interpreted during the import. Setting
              the option to binary means that no
              conversion is done during the import. When you omit this
              option, the import uses the character set specified by the
              character_set_database
              system variable to interpret the input data.
            
bytesPerChunk:
            "size"
          
              For a list of multiple input data files, this option is
              not available. For a single input data file, this option
              specifies the number of bytes (plus any additional bytes
              required to reach the end of the row) that threads send
              for each LOAD DATA call to
              the target server. The utility distributes the data into
              chunks of this size for threads to pick up and send to the
              target server. The chunk size can be specified as a number
              of bytes, or using the suffixes k (kilobytes), M
              (megabytes), G (gigabytes). For example,
              bytesPerChunk="2k" makes threads send
              chunks of approximately 2 kilobytes. The minimum chunk
              size is 131072 bytes, and the default chunk size is 50M.
            
threads:
            number
          The maximum number of parallel threads to use to send the data in the input file or files to the target server. If you do not specify a number of threads, the default maximum is 8. For a list of multiple input data files, the utility creates the specified or maximum number of threads. For a single input data file, the utility calculates an appropriate number of threads to create up to this maximum, using the following formula:
min{max{1, threads}, chunks}}
              where threads is the maximum number of
              threads, and chunks is the number of
              chunks that the data will be split into, which is
              calculated by dividing the file size by the
              bytesPerChunk size then adding 1. The
              calculation ensures that if the maximum number of threads
              exceeds the number of chunks that will actually be sent,
              the utility does not create more threads than necessary.
            
Compressed files cannot be distributed into chunks, so instead the utility uses its parallel connections to upload multiple files at a time. If there is only one input data file, the upload of a compressed file can only use a single connection.
maxRate:
            "rate"
          
              The maximum limit on data throughput in bytes per second
              per thread. Use this option if you need to avoid
              saturating the network or the I/O or CPU for the client
              host or target server. The maximum rate can be specified
              as a number of bytes, or using the suffixes k (kilobytes),
              M (megabytes), G (gigabytes). For example,
              maxRate="5M" limits each thread to 5MB
              of data per second, which for eight threads gives a
              transfer rate of 40MB/second. The default is 0, meaning
              that there is no limit.
            
showProgress: [ true | false ]
          
              Display (true) or hide
              (false) progress information for the
              import. The default is true if stdout
              is a terminal (tty), and false
              otherwise.
            
sessionInitSql: list of
            strings
          A list of SQL statements to run at the start of each client session used for loading data into the target MySQL instance. You can use this option to change session variables. For example, the following statements skip binary logging on the target MySQL instance for the sessions used by the utility during the course of the import, and increase the number of threads available for index creation:
sessionInitSQL: ["SET SESSION sql_log_bin=0;", "SET SESSION innodb_ddl_threads=8,"]
If an error occurs while running the SQL statements, the import stops and returns an error message.
MySQL Shell supports importing input data files stored in Oracle Cloud Infrastructure Object Storage buckets.
osBucketName:
            "string"
          
              The name of the Oracle Cloud Infrastructure Object Storage bucket where the
              input data file is located. By default, the
              [DEFAULT] profile in the Oracle Cloud
              Infrastructure CLI configuration file located at
              ~/.oci/config is used to establish a
              connection to the bucket. You can substitute an
              alternative profile to be used for the connection with the
              ociConfigFile and
              ociProfile options. For instructions to
              set up a CLI configuration file, see
              SDK
              and CLI Configuration File.
            
osNamespace:
            "string"
          
              The Oracle Cloud Infrastructure namespace where the Object Storage bucket named
              by osBucketName is located. The
              namespace for an Object Storage bucket is displayed in the
              Bucket Information tab of the bucket details page in the
              Oracle Cloud Infrastructure console, or can be obtained using the Oracle Cloud Infrastructure command
              line interface.
            
ociConfigFile:
            "string"
          
              An Oracle Cloud Infrastructure CLI configuration file that contains the profile
              to use for the connection, instead of the one in the
              default location ~/.oci/config.
            
ociProfile:
            "string"
          
              The profile name of the Oracle Cloud Infrastructure profile to use for the
              connection, instead of the [DEFAULT]
              profile in the Oracle Cloud Infrastructure CLI
              configuration file used for the connection.
            
ociAuth:
            "string"
          
              The authentication method to use when connecting to Oracle Cloud Infrastructure.
              This option requires osBucketName  is
              configured with a valid value.
            
The following options are available:
                  api_key: OCI connections use the
                  OCI configuration file. See
                  Section 4.8.1, “Oracle Cloud Infrastructure Object Storage”.
                
                  If osBucketName  is defined with a
                  valid value, but ociAuth is not
                  defined, api_key is the default
                  value used.
                
                  instance_principal: OCI connections
                  use instance principal authentication. See
                  Instance
                  Principal Authentication.
                
                  This option can not be used if
                  ociConfigFile or
                  ociProfile are defined.
                
                  resource_principal: OCI connections
                  use resource principal authentication. See
                  Resource
                  Principal Authentication.
                
                  This option can not be used if
                  ociConfigFile or
                  ociProfile are defined.
                
                  security_token: OCI connections use
                  a temporary, generated session token. See
                  Session
                  Token-Based Authentication.
                
MySQL Shell supports importing input data files stored in S3-compatible buckets, such as Amazon Web Services (AWS) S3.
MySQL Shell supports AWS S3 configuration in command line options, environment variables, and configuration files. Command line options override environment variables, configuration files, and default options.
For information on configuration requirements, see Section 4.8, “Cloud Service Configuration”.
s3BucketName:
            "string"
          
              The name of the S3 bucket where the dump files are
              located. By default, the default
              profile in the Amazon Web Services (AWS) CLI
              config and
              credentials files located at
              ~/.aws/ are used to establish a
              connection to the S3 bucket. You can substitute
              alternative configurations and credentials for the
              connection with the s3ConfigFile and
              s3CredentialsFile options. For
              instructions on installing and configuring the AWS CLI,
              see
              Getting
              started with the AWS CLI.
            
s3CredentialsFile:
            "string"
          
              A credentials file that contains the user's credentials to
              use for the connection, instead of the one in the default
              location, ~/.aws/credentials.
              Typically, the credentials file contains the
              aws_access_key_id and
              aws_secret_access_key to use for the
              connection.
            
s3ConfigFile:
            "string"
          
              An AWS CLI configuration file that contains the profile to
              use for the connection, instead of the one in the default
              location ~/.aws/config. Typically,
              the config file contains the region and output type to use
              for the connection.
            
s3Profile:
            "string"
          
              The profile name of the s3 CLI profile to use for the
              connection, instead of the default
              profile in the AWS CLI configuration file used for the
              connection.
            
s3Region:
            "string"
          The name of the region to use for the connection.
s3EndpointOverride:
            "string"
          The URL of the endpoint to use instead of the default.
              When connecting to the Oracle Cloud Infrastructure S3 compatibility API, the
              endpoint takes the following format:
              https://.
              Replace namespace.compat.objectstorage.region.oraclecloud.comnamespace with the
              Object Storage namespace and
              region with your region
              identifier. For example, the region identifier for the US
              East (Ashburn) region is us-ashburn-1.
            
For a namespace named axaxnpcrorw5 in the US East (Ashburn) region:
              https://axaxnpcrorw5.compat.objectstorage.us-ashburn-1.oraclecloud.com.
            
MySQL Shell supports importing from Microsoft Azure Blob Storage.
MySQL Shell supports Microsoft Azure Blob Storage configuration in command line options, environment variables, and configuration files. Command line options override environment variables and configuration files.
For information on configuration requirements and the order of precedence of the configuration types, see Section 4.8, “Cloud Service Configuration”.
azureContainerName:
            "string"
          Mandatory. The name of the Azure container from which the table is to be imported. The container must exist.
azureConfigFile:
            "string"
          
              Optional. A configuration file that contains the storage
              connection parameters, instead of the one in the default
              location, such as ~/.azure/config. If
              this is not defined, the default configuration file is
              used.
            
              azureContainerName must be defined, and
              not be empty.
            
azureStorageAccount:
            "string"
          Optional. The name of the Azure storage account to use for the operation.
azureStorageSasToken:
            "string"
          Optional. Azure Shared Access Signature (SAS) token to be used for the authentication of the operation, instead of a key.