InnoDB provides an optimization that
significantly improves scalability and performance of SQL
statements that insert rows into tables with
AUTO_INCREMENT columns. To use the
AUTO_INCREMENT mechanism with an
InnoDB table, an
AUTO_INCREMENT column
ai_col must be defined as part of an
index such that it is possible to perform the equivalent of an
indexed SELECT
MAX( lookup on the
table to obtain the maximum column value. Typically, this is
achieved by making the column the first column of some table
index.
ai_col)
This section provides background information on the original
(“traditional”) implementation of auto-increment
locking in InnoDB, explains the configurable
locking mechanism, documents the parameter for configuring the
mechanism, and describes its behavior and interaction with
replication.
The original implementation of auto-increment handling in
InnoDB uses the following strategy to prevent
problems when using the binary log for statement-based
replication or for certain recovery scenarios.
If you specify an AUTO_INCREMENT column for
an InnoDB table, the table handle in the
InnoDB data dictionary contains a special
counter called the auto-increment counter that is used in
assigning new values for the column. This counter is stored only
in main memory, not on disk.
InnoDB uses the following algorithm to
initialize the auto-increment counter for a table
t that contains an
AUTO_INCREMENT column named
ai_col: After a server startup, for the first
insert into a table t,
InnoDB executes the equivalent of this
statement:
SELECT MAX(ai_col) FROM t FOR UPDATE;
InnoDB increments the value retrieved by the
statement and assigns it to the column and to the auto-increment
counter for the table. By default, the value is incremented by
one. This default can be overridden by the
auto_increment_increment
configuration setting.
If the table is empty, InnoDB uses the value
1. This default can be overridden by the
auto_increment_offset
configuration setting.
If a SHOW TABLE STATUS statement
examines the table t before the
auto-increment counter is initialized, InnoDB
initializes but does not increment the value and stores it for
use by later inserts. This initialization uses a normal
exclusive-locking read on the table and the lock lasts to the
end of the transaction.
InnoDB follows the same procedure for
initializing the auto-increment counter for a freshly created
table.
After the auto-increment counter has been initialized, if you do
not explicitly specify a value for an
AUTO_INCREMENT column,
InnoDB increments the counter and assigns the
new value to the column. If you insert a row that explicitly
specifies the column value, and the value is bigger than the
current counter value, the counter is set to the specified
column value.
If a user specifies NULL or
0 for the AUTO_INCREMENT
column in an INSERT,
InnoDB treats the row as if the value was not
specified and generates a new value for it.
The behavior of the auto-increment mechanism is not defined if you assign a negative value to the column, or if the value becomes bigger than the maximum integer that can be stored in the specified integer type.
When accessing the auto-increment counter,
InnoDB uses a special table-level
AUTO-INC lock that it keeps to the end of the
current SQL statement, not to the end of the transaction. The
special lock release strategy was introduced to improve
concurrency for inserts into a table containing an
AUTO_INCREMENT column. Nevertheless, two
transactions cannot have the AUTO-INC lock on
the same table simultaneously, which can have a performance
impact if the AUTO-INC lock is held for a
long time. That might be the case for a statement such as
INSERT INTO t1 ... SELECT ... FROM t2 that
inserts all rows from one table into another.
InnoDB uses the in-memory auto-increment
counter as long as the server runs. When the server is stopped
and restarted, InnoDB reinitializes the
counter for each table for the first
INSERT to the table, as described
earlier.
A server restart also cancels the effect of the
AUTO_INCREMENT =
table option in NCREATE TABLE and
ALTER TABLE statements, which you
can use with InnoDB tables to set the initial
counter value or alter the current counter value.
You may see gaps in the sequence of values assigned to the
AUTO_INCREMENT column if you roll back
transactions that have generated numbers using the counter.
As described in the previous section, InnoDB
uses a special lock called the table-level
AUTO-INC lock for inserts into tables with
AUTO_INCREMENT columns. This lock is normally
held to the end of the statement (not to the end of the
transaction), to ensure that auto-increment numbers are assigned
in a predictable and repeatable order for a given sequence of
INSERT statements.
In the case of statement-based replication, this means that when
an SQL statement is replicated on a slave server, the same
values are used for the auto-increment column as on the master
server. The result of execution of multiple
INSERT statements is
deterministic, and the slave reproduces the same data as on the
master. If auto-increment values generated by multiple
INSERT statements were
interleaved, the result of two concurrent
INSERT statements would be
nondeterministic, and could not reliably be propagated to a
slave server using statement-based replication.
To make this clear, consider an example that uses this table:
CREATE TABLE t1 ( c1 INT(11) NOT NULL AUTO_INCREMENT, c2 VARCHAR(10) DEFAULT NULL, PRIMARY KEY (c1) ) ENGINE=InnoDB;
Suppose that there are two transactions running, each inserting
rows into a table with an AUTO_INCREMENT
column. One transaction is using an
INSERT ...
SELECT statement that inserts 1000 rows, and another
is using a simple INSERT
statement that inserts one row:
Tx1: INSERT INTO t1 (c2) SELECT 1000 rows from another table ...
Tx2: INSERT INTO t1 (c2) VALUES ('xxx');
InnoDB cannot tell in advance how many rows
will be retrieved from the SELECT
in the INSERT statement in Tx1,
and it assigns the auto-increment values one at a time as the
statement proceeds. With a table-level lock, held to the end of
the statement, only one INSERT
statement referring to table t1 can execute
at a time, and the generation of auto-increment numbers by
different statements is not interleaved. The auto-increment
value generated by the Tx1
INSERT ...
SELECT statement will be consecutive, and the (single)
auto-increment value used by the
INSERT statement in Tx2 will
either be smaller or larger than all those used for Tx1,
depending on which statement executes first.
As long as the SQL statements execute in the same order when
replayed from the binary log (when using statement-based
replication, or in recovery scenarios), the results will be the
same as they were when Tx1 and Tx2 first ran. Thus, table-level
locks held until the end of a statement make
INSERT statements using
auto-increment safe for use with statement-based replication.
However, those locks limit concurrency and scalability when
multiple transactions are executing insert statements at the
same time.
In the preceding example, if there were no table-level lock, the
value of the auto-increment column used for the
INSERT in Tx2 depends on
precisely when the statement executes. If the
INSERT of Tx2 executes while the
INSERT of Tx1 is running (rather
than before it starts or after it completes), the specific
auto-increment values assigned by the two
INSERT statements are
nondeterministic, and may vary from run to run.
InnoDB can avoid using the table-level
AUTO-INC lock for a class of
INSERT statements where the
number of rows is known in advance, and still preserve
deterministic execution and safety for statement-based
replication. Further, if you are not using the binary log to
replay SQL statements as part of recovery or replication, you
can entirely eliminate use of the table-level
AUTO-INC lock for even greater concurrency
and performance, at the cost of permitting gaps in
auto-increment numbers assigned by a statement and potentially
having the numbers assigned by concurrently executing statements
interleaved.
For INSERT statements where the
number of rows to be inserted is known at the beginning of
processing the statement, InnoDB quickly
allocates the required number of auto-increment values without
taking any lock, but only if there is no concurrent session
already holding the table-level AUTO-INC lock
(because that other statement will be allocating auto-increment
values one-by-one as it proceeds). More precisely, such an
INSERT statement obtains
auto-increment values under the control of a mutex (a
light-weight lock) that is not held until
the statement completes, but only for the duration of the
allocation process.
This new locking scheme enables much greater scalability, but it
does introduce some subtle differences in how auto-increment
values are assigned compared to the original mechanism. To
describe the way auto-increment works in
InnoDB, the following discussion defines some
terms, and explains how InnoDB behaves using
different settings of the
innodb_autoinc_lock_mode
configuration parameter, which you can set at server startup.
Additional considerations are described following the
explanation of auto-increment locking behavior.
First, some definitions:
“INSERT-like”
statements
All statements that generate new rows in a table, including
INSERT,
INSERT ...
SELECT, REPLACE,
REPLACE ...
SELECT, and LOAD
DATA.
“Simple inserts”
Statements for which the number of rows to be inserted can
be determined in advance (when the statement is initially
processed). This includes single-row and multiple-row
INSERT and
REPLACE statements that do
not have a nested subquery, but not
INSERT
... ON DUPLICATE KEY UPDATE.
“Bulk inserts”
Statements for which the number of rows to be inserted (and
the number of required auto-increment values) is not known
in advance. This includes
INSERT ...
SELECT,
REPLACE ...
SELECT, and LOAD
DATA statements, but not plain
INSERT. InnoDB will
assign new values for the AUTO_INCREMENT
column one at a time as each row is processed.
“Mixed-mode inserts”
These are “simple insert” statements that
specify the auto-increment value for some (but not all) of
the new rows. An example follows, where
c1 is an
AUTO_INCREMENT column of table
t1:
INSERT INTO t1 (c1,c2) VALUES (1,'a'), (NULL,'b'), (5,'c'), (NULL,'d');
Another type of “mixed-mode insert” is
INSERT
... ON DUPLICATE KEY UPDATE, which in the worst
case is in effect an INSERT
followed by a UPDATE, where
the allocated value for the
AUTO_INCREMENT column may or may not be
used during the update phase.
There are three possible settings for the
innodb_autoinc_lock_mode
parameter:
innodb_autoinc_lock_mode = 0
(“traditional” lock mode)
This lock mode provides the same behavior as before
innodb_autoinc_lock_mode
existed. For all
“INSERT-like”
statements, a special table-level
AUTO-INC lock is obtained and held to the
end of the statement. This assures that the auto-increment
values assigned by any given statement are consecutive.
This lock mode is provided for:
Backward compatibility.
Performance testing.
Working around issues with “mixed-mode inserts”, due to the possible differences in semantics described later.
innodb_autoinc_lock_mode = 1
(“consecutive” lock mode)
This is the default lock mode. In this mode, “bulk
inserts” use the special AUTO-INC
table-level lock and hold it until the end of the statement.
This applies to all
INSERT ...
SELECT,
REPLACE ...
SELECT, and LOAD
DATA statements. Only one statement holding the
AUTO-INC lock can execute at a time.
With this lock mode, “simple inserts” (only)
use a new locking model where a light-weight mutex is used
during the allocation of auto-increment values, and no
table-level AUTO-INC lock is used, unless
an AUTO-INC lock is held by another
transaction. If another transaction does hold an
AUTO-INC lock, a “simple
insert” waits for the AUTO-INC
lock, as if it too were a “bulk insert”.
This lock mode ensures that, in the presence of
INSERT statements where the
number of rows is not known in advance (and where
auto-increment numbers are assigned as the statement
progresses), all auto-increment values assigned by any
“INSERT-like”
statement are consecutive, and operations are safe for
statement-based replication.
Simply put, the important impact of this lock mode is significantly better scalability. This mode is safe for use with statement-based replication. Further, as with “traditional” lock mode, auto-increment numbers assigned by any given statement are consecutive. In this mode, there is no change in semantics compared to “traditional” mode for any statement that uses auto-increment, with one important exception.
The exception is for “mixed-mode inserts”,
where the user provides explicit values for an
AUTO_INCREMENT column for some, but not
all, rows in a multiple-row “simple insert”.
For such inserts, InnoDB will allocate
more auto-increment values than the number of rows to be
inserted. However, all values automatically assigned are
consecutively generated (and thus higher than) the
auto-increment value generated by the most recently executed
previous statement. “Excess” numbers are lost.
innodb_autoinc_lock_mode = 2
(“interleaved” lock mode)
In this lock mode, no
“INSERT-like”
statements use the table-level AUTO-INC
lock, and multiple statements can execute at the same time.
This is the fastest and most scalable lock mode, but it is
not safe when using statement-based
replication or recovery scenarios when SQL statements are
replayed from the binary log.
In this lock mode, auto-increment values are guaranteed to
be unique and monotonically increasing across all
concurrently executing
“INSERT-like”
statements. However, because multiple statements can be
generating numbers at the same time (that is, allocation of
numbers is interleaved across
statements), the values generated for the rows inserted by
any given statement may not be consecutive.
If the only statements executing are “simple inserts” where the number of rows to be inserted is known ahead of time, there will be no gaps in the numbers generated for a single statement, except for “mixed-mode inserts”. However, when “bulk inserts” are executed, there may be gaps in the auto-increment values assigned by any given statement.
The auto-increment locking modes provided by
innodb_autoinc_lock_mode have
several usage implications:
Using auto-increment with replication
If you are using statement-based replication, set
innodb_autoinc_lock_mode to
0 or 1 and use the same value on the master and its slaves.
Auto-increment values are not ensured to be the same on the
slaves as on the master if you use
innodb_autoinc_lock_mode =
2 (“interleaved”) or configurations where the
master and slaves do not use the same lock mode.
If you are using row-based replication, all of the auto-increment lock modes are safe. Row-based replication is not sensitive to the order of execution of the SQL statements.
“Lost” auto-increment values and sequence gaps
In all lock modes (0, 1, and 2), if a transaction that
generated auto-increment values rolls back, those
auto-increment values are “lost”. Once a value
is generated for an auto-increment column, it cannot be
rolled back, whether or not the
“INSERT-like”
statement is completed, and whether or not the containing
transaction is rolled back. Such lost values are not reused.
Thus, there may be gaps in the values stored in an
AUTO_INCREMENT column of a table.
Gaps in auto-increment values for “bulk inserts”
With
innodb_autoinc_lock_mode
set to 0 (“traditional”) or 1
(“consecutive”), the auto-increment values
generated by any given statement will be consecutive,
without gaps, because the table-level
AUTO-INC lock is held until the end of
the statement, and only one such statement can execute at a
time.
With
innodb_autoinc_lock_mode
set to 2 (“interleaved”), there may be gaps in
the auto-increment values generated by “bulk
inserts,” but only if there are concurrently
executing
“INSERT-like”
statements.
For lock modes 1 or 2, gaps may occur between successive statements because for bulk inserts the exact number of auto-increment values required by each statement may not be known and overestimation is possible.
Auto-increment values assigned by “mixed-mode inserts”
Consider a “mixed-mode insert,” where a
“simple insert” specifies the auto-increment
value for some (but not all) resulting rows. Such a
statement will behave differently in lock modes 0, 1, and 2.
For example, assume c1 is an
AUTO_INCREMENT column of table
t1, and that the most recent
automatically generated sequence number is 100. Consider the
following “mixed-mode insert” statement:
INSERT INTO t1 (c1,c2) VALUES (1,'a'), (NULL,'b'), (5,'c'), (NULL,'d');
With
innodb_autoinc_lock_mode
set to 0 (“traditional”), the four new rows
will be:
+-----+------+ | c1 | c2 | +-----+------+ | 1 | a | | 101 | b | | 5 | c | | 102 | d | +-----+------+
The next available auto-increment value will be 103 because
the auto-increment values are allocated one at a time, not
all at once at the beginning of statement execution. This
result is true whether or not there are concurrently
executing
“INSERT-like”
statements (of any type).
With
innodb_autoinc_lock_mode
set to 1 (“consecutive”), the four new rows
will also be:
+-----+------+ | c1 | c2 | +-----+------+ | 1 | a | | 101 | b | | 5 | c | | 102 | d | +-----+------+
However, in this case, the next available auto-increment
value will be 105, not 103 because four auto-increment
values are allocated at the time the statement is processed,
but only two are used. This result is true whether or not
there are concurrently executing
“INSERT-like”
statements (of any type).
With
innodb_autoinc_lock_mode
set to mode 2 (“interleaved”), the four new
rows will be:
+-----+------+ | c1 | c2 | +-----+------+ | 1 | a | |x| b | | 5 | c | |y| d | +-----+------+
The values of x and
y will be unique and larger than
any previously generated rows. However, the specific values
of x and
y will depend on the number of
auto-increment values generated by concurrently executing
statements.
Finally, consider the following statement, issued when the most-recently generated sequence number was the value 4:
INSERT INTO t1 (c1,c2) VALUES (1,'a'), (NULL,'b'), (5,'c'), (NULL,'d');
With any
innodb_autoinc_lock_mode
setting, this statement will generate a duplicate-key error
23000 (Can't write; duplicate key in
table) because 5 will be allocated for the row
(NULL, 'b') and insertion of the row
(5, 'c') will fail.