8.2.4 Optimizing INFORMATION_SCHEMA Queries

Applications that monitor the database can make frequent use of the INFORMATION_SCHEMA tables. Certain types of queries for INFORMATION_SCHEMA tables can be optimized to execute more quickly. The goal is to minimize file operations (for example, scanning a directory or opening a table file) to collect the information that makes up these dynamic tables. These optimizations do have an effect on how collations are used for searches in INFORMATION_SCHEMA tables. For more information, see Section 10.1.7.9, “Collation and INFORMATION_SCHEMA Searches”.

1) Try to use constant lookup values for database and table names in the WHERE clause

You can take advantage of this principle as follows:

This principle applies to the INFORMATION_SCHEMA tables shown in the following table, which shows the columns for which a constant lookup value enables the server to avoid a directory scan. For example, if you are selecting from TABLES, using a constant lookup value for TABLE_SCHEMA in the WHERE clause enables a data directory scan to be avoided.

TableColumn to specify to avoid data directory scanColumn to specify to avoid database directory scan
COLUMNSTABLE_SCHEMATABLE_NAME
KEY_COLUMN_USAGETABLE_SCHEMATABLE_NAME
PARTITIONSTABLE_SCHEMATABLE_NAME
REFERENTIAL_CONSTRAINTSCONSTRAINT_SCHEMATABLE_NAME
STATISTICSTABLE_SCHEMATABLE_NAME
TABLESTABLE_SCHEMATABLE_NAME
TABLE_CONSTRAINTSTABLE_SCHEMATABLE_NAME
TRIGGERSEVENT_OBJECT_SCHEMAEVENT_OBJECT_TABLE
VIEWSTABLE_SCHEMATABLE_NAME

The benefit of a query that is limited to a specific constant database name is that checks need be made only for the named database directory. Example:

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'test';

Use of the literal database name test enables the server to check only the test database directory, regardless of how many databases there might be. By contrast, the following query is less efficient because it requires a scan of the data directory to determine which database names match the pattern 'test%':

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA LIKE 'test%';

For a query that is limited to a specific constant table name, checks need be made only for the named table within the corresponding database directory. Example:

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 't1';

Use of the literal table name t1 enables the server to check only the files for the t1 table, regardless of how many tables there might be in the test database. By contrast, the following query requires a scan of the test database directory to determine which table names match the pattern 't%':

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME LIKE 't%';

The following query requires a scan of the database directory to determine matching database names for the pattern 'test%', and for each matching database, it requires a scan of the database directory to determine matching table names for the pattern 't%':

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'test%' AND TABLE_NAME LIKE 't%';

2) Write queries that minimize the number of table files that must be opened

For queries that refer to certain INFORMATION_SCHEMA table columns, several optimizations are available that minimize the number of table files that must be opened. Example:

SELECT TABLE_NAME, ENGINE FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'test';

In this case, after the server has scanned the database directory to determine the names of the tables in the database, those names become available with no further file system lookups. Thus, TABLE_NAME requires no files to be opened. The ENGINE (storage engine) value can be determined by opening the table's .frm file, without touching other table files such as the .MYD or .MYI file.

Some values, such as INDEX_LENGTH for MyISAM tables, require opening the .MYD or .MYI file as well.

The file-opening optimization types are denoted thus:

The following list indicates how the preceding optimization types apply to INFORMATION_SCHEMA table columns. For tables and columns not named, none of the optimizations apply.

3) Use EXPLAIN to determine whether the server can use INFORMATION_SCHEMA optimizations for a query

This applies particularly for INFORMATION_SCHEMA queries that search for information from more than one database, which might take a long time and impact performance. The Extra value in EXPLAIN output indicates which, if any, of the optimizations described earlier the server can use to evaluate INFORMATION_SCHEMA queries. The following examples demonstrate the kinds of information you can expect to see in the Extra value.

mysql> EXPLAIN SELECT TABLE_NAME FROM INFORMATION_SCHEMA.VIEWS WHERE
    -> TABLE_SCHEMA = 'test' AND TABLE_NAME = 'v1'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: VIEWS
         type: ALL
possible_keys: NULL
          key: TABLE_SCHEMA,TABLE_NAME
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using where; Open_frm_only; Scanned 0 databases

Use of constant database and table lookup values enables the server to avoid directory scans. For references to VIEWS.TABLE_NAME, only the .frm file need be opened.

mysql> EXPLAIN SELECT TABLE_NAME, ROW_FORMAT FROM INFORMATION_SCHEMA.TABLES\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: TABLES
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Open_full_table; Scanned all databases

No lookup values are provided (there is no WHERE clause), so the server must scan the data directory and each database directory. For each table thus identified, the table name and row format are selected. TABLE_NAME requires no further table files to be opened (the SKIP_OPEN_TABLE optimization applies). ROW_FORMAT requires all table files to be opened (OPEN_FULL_TABLE applies). EXPLAIN reports OPEN_FULL_TABLE because it is more expensive than SKIP_OPEN_TABLE.

mysql> EXPLAIN SELECT TABLE_NAME, TABLE_TYPE FROM INFORMATION_SCHEMA.TABLES
    -> WHERE TABLE_SCHEMA = 'test'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: TABLES
         type: ALL
possible_keys: NULL
          key: TABLE_SCHEMA
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using where; Open_frm_only; Scanned 1 database

No table name lookup value is provided, so the server must scan the test database directory. For the TABLE_NAME and TABLE_TYPE columns, the SKIP_OPEN_TABLE and OPEN_FRM_ONLY optimizations apply, respectively. EXPLAIN reports OPEN_FRM_ONLY because it is more expensive.

mysql> EXPLAIN SELECT B.TABLE_NAME
    -> FROM INFORMATION_SCHEMA.TABLES AS A, INFORMATION_SCHEMA.COLUMNS AS B
    -> WHERE A.TABLE_SCHEMA = 'test'
    -> AND A.TABLE_NAME = 't1'
    -> AND B.TABLE_NAME = A.TABLE_NAME\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: A
         type: ALL
possible_keys: NULL
          key: TABLE_SCHEMA,TABLE_NAME
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using where; Skip_open_table; Scanned 0 databases
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: B
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using where; Open_frm_only; Scanned all databases;
               Using join buffer

For the first EXPLAIN output row: Constant database and table lookup values enable the server to avoid directory scans for TABLES values. References to TABLES.TABLE_NAME require no further table files.

For the second EXPLAIN output row: All COLUMNS table values are OPEN_FRM_ONLY lookups, so COLUMNS.TABLE_NAME requires the .frm file to be opened.

mysql> EXPLAIN SELECT * FROM INFORMATION_SCHEMA.COLLATIONS\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: COLLATIONS
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra:

In this case, no optimizations apply because COLLATIONS is not one of the INFORMATION_SCHEMA tables for which optimizations are available.