2 In-Memory Column Store Architecture

The In-Memory Column Store (IM column store) stores tables and partitions in memory using a columnar format optimized for rapid scans. Oracle Database uses a sophisticated architecture to manage data in columnar and row formats simultaneously.

Dual-Format: Column and Row

When you enable an IM column store, the SGA manages data in separate locations: the In-Memory Area and the database buffer cache.

The IM column store encodes data in a columnar format: each column is a separate structure. The columns are stored contiguously, which optimizes them for analytic queries. The database buffer cache can modify objects that are also populated in the IM column store. However, the buffer cache stores data in the traditional row format. Data blocks store the rows contiguously, optimizing them for transactions.

The following figure illustrates the difference between row-based storage and columnar storage.

Figure 2-1 Columnar and Row-Based Storage

Description of Figure 2-1 follows
Description of "Figure 2-1 Columnar and Row-Based Storage"

Columnar Data in the In-Memory Area

The In-Memory Area is an optional SGA component that contains the IM column store.

Memory Pools in the In-Memory Area

The In-Memory Area is divided into subpools for columnar data and metadata.

These pools can be seen in the Example 2-1

  • The columnar data pool

    This subpool stores the IMCUs, which contain the columnar data. The V$INMEMORY_AREA.POOL column identifies this subpool as 1MB POOL.

  • The metadata pool

    This subpool stores metadata about the objects that reside in the IM column store. The V$INMEMORY_AREA.POOL column identifies this subpool as 64KB POOL.

  • IM pool metadata

    This subpool stores other specialized metadata structures which are not suitable to be stored in the 64K POOL. It's listed as IM POOL METADATA in V$INMEMORY_AREA.POOL

The database determines the relative size of the two subpools. It allocates the majority of space in the In-Memory Area to the columnar data pool (1 MB pool).

Note:

Oracle Database automatically determines the subpool sizes. You cannot change the space allocations.

Example 2-1 V$INMEMORY_AREA View

This example queries the V$INMEMORY_AREA view to determine the amount of available memory in each subpool (sample output included):

SELECT pool, alloc_bytes, used_bytes, populate_status 
FROM V$INMEMORY_AREA;

POOL              ALLOC_BYTES    USED_BYTES    POPULATE_STATUS
1MB POOL          796,917,760    680,525,824   DONE
64KB POOL         25,165,824     4,128,768     DONE
IM POOL METADATA  16,777,216     16,777,216    DONE

The current size of the In-Memory area is visible in V$SGA:

SELECT name, value/(1024*1024*1024) "SIZE_IN_GB"
FROM   V$SGA 
WHERE  NAME LIKE '%Mem%';

NAME                 SIZE_IN_GB
-------------------- ----------
In-Memory Area       10

In this example, you can see that the database uses a small percentage of memory for internal management structures.

See Also:

Oracle Database Reference to learn about V$INMEMORY_AREA

Row Data in the Database Buffer Cache

The database buffer cache stores and processes data blocks in the same way whether the IM column store is enabled or disabled. Buffer I/O and buffer pools function the same.

The IM column store enables data to be simultaneously populated in the SGA in both the traditional row format (the buffer cache) and the columnar format. The database transparently sends OLTP queries (such as primary key lookups) to the buffer cache, and analytic and reporting queries to the IM column store. When fetching data, Oracle Database can also read data from both memory areas within the same query.

Note:

In the execution plan, the operation TABLE ACCESS IN MEMORY FULL indicates that some or all data is accessed in the IM column store.

The dual-format architecture does not double memory requirements. The buffer cache is optimized to run with a much smaller size than the size of the database.

The following figure shows a sample IM column store. The database stores the sh.sales table on disk in traditional row format. The SGA stores the data in columnar format in the IM column store, and in row format in the database buffer cache.

Every on-disk data format for permanent, heap-organized tables is supported by the IM column store. The columnar format does not affect the format of data stored in data files or in the buffer cache, nor does it affect undo data and online redo logging.

The database processes DML modifications in the same way, regardless of whether the IM column store is enabled, by updating the buffer cache, online redo log, and undo tablespace. However, the database uses an internal mechanism to track changes and ensure that the IM column store is consistent with the rest of the database. For example, if the sales table is populated in the IM column store, and if an application updates a row in sales, then the database automatically keeps the copy of the sales table in the IM column store transactionally consistent. A query that accesses the IM column store always returns the same results for a query that accesses the buffer cache.

See Also:

Oracle Database Concepts to learn more about the database buffer cache

Size of the In-Memory Area

The In-Memory Area within the System Global Area (SGA) for a database instance is controlled by the INMEMORY_SIZE initialization parameter. By default, the size of the In-Memory Area is 0, which means the IM column store is disabled. The In-Memory Area can also be resized dynamically, either through automatic management or manually.

To enable the IM column store, set the In-Memory Area to a value greater than zero. The size is shown in V$SGA.

The In-Memory Area and SGA_TARGET

The In-Memory Area is subtracted from the SGA_TARGET initialization parameter setting. For example, if you set SGA_TARGET to 10 GB, and if you set the INMEMORY_SIZE to 4 GB, then 40% of the SGA_TARGET setting is allocated to the In-Memory Area. The following graphic illustrates the relationship.

Figure 2-3 INMEMORY_SIZE and SGA_TARGET

Description of Figure 2-3 follows
Description of "Figure 2-3 INMEMORY_SIZE and SGA_TARGET"

As of Oracle Database 23ai when ASMM (Automatic Shared Memory Management) is enabled, the In Memory Area is controlled by automatic memory management, like the other components of the SGA, including the buffer cache and the shared pool. ASMM provides dynamic resizing of the physical In-Memory column store based on the workload.

Automatic memory management requires that AIM (Automatic In-Memory) level is set to MEDIUM or HIGH. Use the INMEMORY_SIZE parameter to set the minimum size of the IM column store.

If ASMM is not enabled, then the In-Memory Area size is not controlled by automatic memory management. The database does not automatically shrink the In-Memory Area when the buffer cache or shared pool requires more memory, or increase the In-Memory Area when it runs out of space. In this case, you can change the physical size of the column store by increasing or decreasing the value of INMEMORY_SIZE before instance startup. This parameter can also be dynamically modified during runtime at the CDB and PDB levels.

In-Memory Storage Units

The IM column store manages both data and metadata in optimized storage units, not in traditional Oracle data blocks.

Oracle Database maintains the storage units in the In-Memory Area. The following graphic gives an overview of the In-Memory Area and the database processes that interact with it. The remaining chapter describes the various memory components.

Figure 2-4 IM Column Store: Memory and Process Architecture

Description of Figure 2-4 follows
Description of "Figure 2-4 IM Column Store: Memory and Process Architecture"

In-Memory Compression Units (IMCUs)

An In-Memory Compression Unit (IMCU) is a compressed, read-only storage unit that contains data for one or more columns.

An IMCU is analogous to a tablespace extent. An IMCU has two parts: a set of Column Compression Units (CUs), and a header that contains metadata such as the IM storage index.

IMCUs and Schema Objects

The IM column store stores data for a single object (table, partition, materialized view) in a set of IMCUs. An IMCU stores columnar data for one and only one object.

IMCUs and INMEMORY Columns

For an object specified as INMEMORY, every column listed in the INMEMORY clause is included in every IMCU.

For example, the sh.sales table has 7 columns. The following DDL statement specifies the table as INMEMORY, which means that every IMCU for sales includes columnar data for these 7 columns:

ALTER TABLE sh.sales INMEMORY MEMCOMPRESS FOR QUERY LOW;

NO INMEMORY Columns in INMEMORY Objects

You can specify that some but not all columns in an INMEMORY table have the INMEMORY attribute. For example, the sh.customers table has 23 columns. The following DDL statement specifies that 15 columns in sh.customers have the NO INMEMORY attribute, which means that the other 8 columns in the table have the INMEMORY attribute:

ALTER TABLE sh.customers INMEMORY 
  MEMCOMPRESS FOR QUERY LOW
  NO INMEMORY ( cust_gender, cust_year_of_birth, cust_marital_status,
                cust_postal_code, cust_city, cust_state_province,
                cust_main_phone_number, cust_income_level, cust_credit_limit,
                cust_email, cust_total, cust_total_id, cust_eff_from,
                cust_eff_to, cust_valid );

The following query shows the compression levels of the columns in sh.customers, indicating which columns are NO INMEMORY:

SET LINESIZE 200
COL TABLE_NAME FORMAT a25
COL SEG_COL_ID FORMAT 999
COL COLUMN_NAME FORMAT a25
COL INMEMORY_COMPRESSION FORMAT a11
SELECT SEGMENT_COLUMN_ID AS "SEG_COL_ID", COLUMN_NAME, INMEMORY_COMPRESSION 
FROM   V$IM_COLUMN_LEVEL WHERE TABLE_NAME = 'CUSTOMERS'
ORDER BY SEG_COL_ID;

SEG_COL_ID COLUMN_NAME               INMEMORY_CO
---------- ------------------------- -----------
         1 CUST_ID                   DEFAULT
         2 CUST_FIRST_NAME           DEFAULT
         3 CUST_LAST_NAME            DEFAULT
         4 CUST_GENDER               NO INMEMORY
         5 CUST_YEAR_OF_BIRTH        NO INMEMORY
         6 CUST_MARITAL_STATUS       NO INMEMORY
         7 CUST_STREET_ADDRESS       DEFAULT
         8 CUST_POSTAL_CODE          NO INMEMORY
         9 CUST_CITY                 NO INMEMORY
        10 CUST_CITY_ID              DEFAULT
        11 CUST_STATE_PROVINCE       NO INMEMORY
        12 CUST_STATE_PROVINCE_ID    DEFAULT
        13 COUNTRY_ID                DEFAULT
        14 CUST_MAIN_PHONE_NUMBER    NO INMEMORY
        15 CUST_INCOME_LEVEL         NO INMEMORY
        16 CUST_CREDIT_LIMIT         NO INMEMORY
        17 CUST_EMAIL                NO INMEMORY
        18 CUST_TOTAL                NO INMEMORY
        19 CUST_TOTAL_ID             NO INMEMORY
        20 CUST_SRC_ID               DEFAULT
        21 CUST_EFF_FROM             NO INMEMORY
        22 CUST_EFF_TO               NO INMEMORY
        23 CUST_VALID                NO INMEMORY

The following graphic represents three tables from the sh schema populated in the IM column store: customers, products, and sales. In this example, each table has a different number of columns specified INMEMORY. The IMCUs for each table include only data for the INMEMORY columns.

Queries That Reference NO INMEMORY Columns

When a query references a NO INMEMORY column, the table scan retrieves data from the row store rather than the IMCUs in the IM column store. Row store access occurs even if all other columns referenced in the query are populated INMEMORY columns.

For example, assume that the customers table is populated into the IM column store. The cust_id and cust_last_name columns are specified INMEMORY, but the cust_postal_code column is specified as NO INMEMORY. You issue the following query:

SELECT cust_id, cust_last_name, cust_postal_code 
FROM   customers 
WHERE  cust_id < 5001
ORDER BY cust_id;

In this case, the database accesses the row store, not the IM column store, even though cust_postal_code is the only NO INMEMORY column referenced in the query. The following query, which has cust_postal_code in the predicate but not the SELECT list, must also access the row store:

SELECT cust_id, cust_last_name 
FROM   customers 
WHERE  cust_postal_code = 77501
ORDER BY cust_id;

See Also:

In-Memory Compression

The IM column store uses special compression formats optimized for access speed rather than storage reduction. The columnar format enables queries to execute directly against the compressed columns.

Compression enables scanning and filtering operations to process a much smaller amount of data, which optimizes query performance. Oracle Database only decompresses data when it is required for the result set.

The compression applied in the IM column store is closely related to Hybrid Columnar Compression. Both technologies process column vectors. The primary difference is that the column vectors for the IM column store are optimized for SIMD vector processing, whereas the column vectors for Hybrid Columnar Compression are optimized for disk storage.

When you manually enable an object for population into the IM column store, you specify the type of compression in the INMEMORY MEMCOMPRESS clause: FOR DML, FOR QUERY (LOW or HIGH), FOR CAPACITY (LOW or HIGH), or NONE. When INMEMORY_AUTOMATIC_LEVEL is HIGH, the database automatically applies MEMCOMPRESS AUTO to all objects.

See Also:

IMCUs and Rows

Each IMCU contains all column values (including nulls) for a subset of rows in a table segment. A subset of rows is called a granule.

All IMCUs for a given segment contain approximately the same number of rows. Oracle Database determines the size of a granule automatically depending on data type, data format, and compression type. A higher compression level results in more rows in the IMCU.

A one-to-many mapping exists between an IMCU and a set of database blocks. As illustrated in Example 2-2, each IMCU stores the values for columns for a different set of blocks.

The columns in an IMCU are not sorted. Oracle Database populates them in the order that they are read from disk.

The number of rows in an IMCU dictates the amount of space an IMCU consumes. If the target number of rows causes an IMCU to grow beyond the amount of contiguous 1 MB extents available in the 1 MB pool, then the IMCU creates additional extents (pieces) to hold the remaining column CUs. An IMCU always allocates space in 1 MB increments.

Example 2-2 IMCUs and Row Subsets

In this simplified example, only the following 4 columns of the customers table have the INMEMORY attribute: cust_id, cust_first_name, cust_last_name, and cust_gender. Only 5 rows exist in the table, stored in 2 data blocks. Conceptually, the first data block stores its rows as follows:

82,Madeline,Li,F;37004,Abel,Embrey,M;1714,Hardy,Gentle,M

The second data block stores rows as follows:

100439,Uma,Campbell,F;3047,Lucia,Downey,F

Assume IMCU 1 stores the data for the first data block. In this case, the cust_id column values for the 3 rows in this data block stores are stored “vertically” within a CU as follows:

82
37004
1714

IMCU 2 stores the data from the second data block. The cust_id column values for these 2 rows are stored within a CU as follows:

100439
3047

Because the cust_id value is the first value for each row in the data block, the cust_id column is in the first position within the IMCU. Columns always occupy the same position, so Oracle Database can reconstruct the rows by reading the IMCUs for a segment.

Column Compression Units (CUs)

A Column Compression Unit (CU) is contiguous storage for a single column in an IMCU. Every IMCU has one or more CUs.

Structure of a CU

A CU is divided into a body and a header.

The body of every CU stores the column values for the range of rows included in the IMCU. The header contains metadata about the values stored in the CU body, for example, the minimum and maximum value within the CU. It may also contain a local dictionary, which is a sorted list of the distinct values in that column and their corresponding dictionary codes.

The following figure shows an IMCU with 4 CUs for the sales table: prod_id, cust_id, time_id, and channel_id. Each CU stores the column values for the range of rows included in the IMCU.

The CUs store values in rowid order. For this reason, the database can answer queries by “stitching” the rows back together. For example, an application issues the following query:

SELECT cust_id, time_id, channel_id
FROM   sales
WHERE  prod_id =5;

The database begins by scanning the prod_id column for entries with the value 5. Assume that the database finds 5 in position two in the prod_id column. The database now must find the corresponding cust_id, time_id, and channel_id for this row. Because the CUs store data in rowid order, the database can find the corresponding cust_id, time_id, and channel_id values in position 2 in those columns. Thus, to answer the query, the database must extract the values from position 2 in the cust_id, time_id, and channel_id columns, and then stitch the row back together to return it to the end user.

Local Dictionary

In a CU, the local dictionary has a list of distinct values and their corresponding dictionary codes.

The local dictionary stores the symbol contained in the column. The following figure illustrates how a CU stores a name column in a vehicles table.

In the preceding figure, the CU contains only 7 rows. Every distinct value in this CU, such as Cadillac or Audi, is assigned a different dictionary code, such as 2 for Cadillac and 0 for Audi. The CU stores the dictionary code rather than the original value.

Note:

When the database uses a common dictionary for a join group, the local dictionary contains references to the common dictionary rather than the symbols. For example, rather than storing the values Audi, BWM, and Cadillac for the vehicles.name column, the local dictionary stores dictionary codes such as 101, 220, and 66.

The CU header contains the minimum and maximum values for the column. In this example, the minimum value is Audi and the maximum value is Cadillac. The local dictionary stores the list of distinct values: Audi, BMW, and Cadillac. Their corresponding dictionary codes (0, 1, and 2) are implicit. The local dictionary for a CU in each IMCU is independent of the local dictionaries in other IMCUs.

If a query filters on Audi automobiles, then the database scans this IMCU for only 0 codes.

In-Memory Storage Indexes

Every IMCU header automatically creates and manages In-Memory Storage Indexes (IM storage indexes) for its CUs. An IM storage index stores the minimum and maximum for all columns within the IMCU.

For example, sales is populated in the IM column store. Every IMCU for this table has all columns. The sales.prod_id column is stored in a separate CU within every IMCU. The IMCU header has the minimum and maximum values of each prod_id CU (and every other CU).

To eliminate unnecessary scans, the database can perform IMCU pruning based on SQL filter predicates. The database scans only the IMCUs that satisfy the query predicate, as shown in the WHERE prod_id > 14 AND prod_id < 29 example in the following graphic.

Figure 2-8 Storage Index for Columnar Data

Description of Figure 2-8 follows
Description of "Figure 2-8 Storage Index for Columnar Data"

Snapshot Metadata Units (SMUs)

A Snapshot Metadata Unit (SMU) contains metadata and transactional information for an associated IMCU.

IMCUs and SMUs

The columnar pool of the In-Memory Area stores the actual data: IMCUs and IMEUs. The metadata pool in the In-Memory Area stores the SMUs.

Figure 2-9 IMCUs and SMUs

This figure shows IMCUs in the data pool, and SMUs in the metadata pool.

Description of Figure 2-9 follows
Description of "Figure 2-9 IMCUs and SMUs"

Every IMCU maps to a separate SMU. Thus, if the columnar data pool contains 100 IMCUs, then the metadata pool contains 100 SMUs. The SMUs store several types of metadata for their associated IMCUs, including the following:

  • Object numbers

  • Column numbers

  • Mapping information for rows

Transaction Journal

Every SMU contains a transaction journal. The database uses the transaction journal to keep the IMCU transactionally consistent.

The database uses the buffer cache to process DML, just as when the IM column store is not enabled. For example, an UPDATE statement might modify a row in an IMCU. In this case, the database adds the rowid for the modified row to the transaction journal and marks it stale as of the SCN of the DML statement. If a query needs to access the new version of the row, then the database obtains the row from the database buffer cache.

Figure 2-10 Transaction Journal

Description of Figure 2-10 follows
Description of "Figure 2-10 Transaction Journal"

The database achieves read consistency by merging the contents of the column, transaction journal, and buffer cache. When the IMCU is refreshed during repopulation, queries can access the up-to-date row directly from the IMCU.

See Also:

"Optimizing Repopulation of the IM Column Store" for an in-depth discussion of how the IM column store maintains transactional consistency

In-Memory Expression Units (IMEUs)

An In-Memory Expression Unit (IMEU) is a storage container for materialized In-Memory Expressions (IM expressions) and user-defined virtual columns.

The database treats materialized expressions just like other columns in the IMCU. Conceptually, an IMEU is a logical extension of its parent IMCU. Just as an IMCU can contain multiple columns, an IMEU can contain multiple virtual columns.

Every IMEU maps to exactly one IMCU, mapping to the same row set. The IMEU contains expression results for the data contained in its associated IMCU. When the IMCU is populated, the associated IMEU is also populated.

A typical IM expression involves one or more columns, possibly with constants, and has a one-to-one mapping with the rows in the table. For example, an IMCU for an employees table contains rows 1–1000 for the column weekly_salary. For the rows stored in this IMCU, the IMEU calculates the automatically detected IM expression weekly_salary*52, and the user-defined virtual column quarterly_salary defined as weekly_salary*12. The 3rd row down in the IMCU maps to the 3rd row down in the IMEU.

The IMEU is a logical extension of the IMCUs of a particular segment. By default, the IMEU inherits the INMEMORY clause properties, including Oracle Real Application Clusters (Oracle RAC) properties such as DISTRIBUTE and DUPLICATE, from the base segment. You can selectively enable or disable virtual columns for storage in IMEUs. You can also specify compression levels for different columns.

Expression Statistics Store (ESS)

The Expression Statistics Store (ESS) is a repository maintained by the optimizer to store statistics about expression evaluation. The ESS resides in the SGA and persists on disk.

When an IM column store is enabled, the database leverages the ESS for its In-Memory Expressions (IM expressions) feature. However, the ESS is independent of the IM column store. The ESS is a permanent component of the database and cannot be disabled.

The database uses the ESS to determine whether an expression is “hot” (frequently accessed), and thus a candidate for an IM expression. During a hard parse of a query, the ESS looks for active expressions in the SELECT list, WHERE clause, GROUP BY clause, and so on.

For each segment, the ESS maintains expression statistics such as the following:

  • Frequency of execution

  • Cost of evaluation

  • Timestamp evaluation

The optimizer assigns each expression a weighted score based on cost and the number of times it was evaluated. The values are approximate rather than exact. More active expressions have higher scores. The ESS maintains an internal list of the most frequently accessed expressions.

Control the behavior of IM expressions using the DBMS_INMEMORY_ADMIN package. For example, the IME_CAPTURE_EXPRESSIONS procedure prompts the database to identify and gradually populate the hottest expressions in the database. The IME_POPULATE_EXPRESSIONS procedure forces the database to populate the expressions immediately.

ESS information is stored in the data dictionary and exposed in the DBA_EXPRESSION_STATISTICS view. This view shows the metadata that the optimizer has collected in the ESS. IM expressions are exposed as system-generated virtual columns, prefixed by the string SYS_IME, in the DBA_IM_EXPRESSIONS view.

See Also:

In-Memory Process Architecture

In response to queries and DML, server processes scan columnar data and update SMU metadata. Background processes populate row data from disk into the IM column store.

In-Memory Coordinator Process (IMCO)

The In-Memory Coordinator Process (IMCO) manages many tasks for the IM column store. Its primary task is to initiate background population and repopulation of columnar data.

Population is a streaming mechanism, converting row data into columnar format, and then compressing it. IMCO automatically initiates population of INMEMORY objects with any priority other than NONE. When objects with priority NONE are accessed, IMCO populates them using Space Management Worker Process (Wnnn) processes.

The IMCO background process also initiates threshold-based repopulation of IM column store objects when they meet a staleness threshold. IMCO may instigate trickle repopulation for any IMCU in the IM column store that has stale entries but does not meet the staleness threshold.

Trickle repopulation occurs automatically in the background. The steps are as follows:

  1. IMCO wakes up.

  2. IMCO determines whether population tasks need to be performed, including whether any stale entries exist in an IMCU.

  3. If IMCO finds stale entries, then it triggers a Space Management Worker Process to repopulate these entries in the IMCU.

  4. IMCO sleeps for two minutes, and then returns to Step 1.

See Also:

Space Management Worker Processes (Wnnn)

Space Management Worker Processes (Wnnn) populate or repopulate data on behalf of IMCO.

During population, Wnnn processes are responsible for creating IMCUs, SMUs, and IMEUs. When creating IMEUs, the worker processes perform the following tasks:

  • Identify virtual columns for population

  • Create virtual column values

  • Compute values for each row, transform the data into columnar format, and compress it

  • Register the objects with the space layer

  • Associate the IMEUs with their corresponding IMCUs

Note:

During IMEU creation, parent IMCUs remain available for queries.

During repopulation, the Wnnn processes create new versions of the IMCUs based on the existing IMCUs and transactions journals, while temporarily retaining the old versions. This mechanism is called double buffering.

The database can quickly move IM expressions in and out of the IM column store. For example, if an IMCU was created without an IMEU, then the database can add an IMEU later without forcing the IMCU to undergo the full repopulation mechanism.

The INMEMORY_MAX_POPULATE_SERVERS initialization parameter controls the maximum number of worker processes that can be started for population. The INMEMORY_TRICKLE_REPOPULATE_PERCENT initialization parameter controls the maximum percentage of time that worker processes can perform trickle repopulation.

In-Memory Dynamic Scans

In-Memory Dynamic Scans (IM dynamic scans) use lightweight threads to parallelize In-Memory table scans.

Purpose of IM Dynamic Scans

When additional CPU is available, IM dynamic scans accelerate In-Memory table scans that are CPU bound.

IM dynamic scans automatically use idle CPU resources to scan IMCUs in parallel and maximize CPU usage. When CPU resources are available, applications can get even faster analytic query results automatically. Because the scans are dynamic, they enable the use of excess CPU bandwidth without affecting existing workload.

IM dynamic scans are more flexible than traditional Oracle parallel execution, although the two are not mutually exclusive. Dynamic scans use multiple lightweight threads of execution within a process. Typically, the performance overhead for dynamic scans is low.

See Also:

Oracle Database Administrator’s Guide to learn more about Resource Manager

How IM Dynamic Scans Work

IM Dynamic Scans attain optimal performance by reading IMCUs in parallel.

About Lightweight Threads

A lightweight thread is an execution entity that helps to parallelize full table scans. It is “lightweight” because it does not incur the higher memory overhead of Oracle processes.

Note:

A lightweight thread used by IM dynamic scans is not the same as a regular thread in the multithreaded Oracle Database model.

Lightweight threads share the resources of the parent foreground or PQ process, called the table scan process, that coordinates the scan of a set of IMCUs. Threads maintain their own independent flow of execution. The database can parallelize scans by prioritizing threads and executing them asynchronously.

For eligible queries, the process allocates a pool of threads. Resource Manager automatically determines the number of threads in the pool based on the CPU count in the database host and the current load on the system. The pool of threads remains available to the session for subsequent queries unless the idle time reaches an internal threshold, at which point the database terminates the threads.

Communication between threads occurs exclusively within a process. For this reason, contention does not occur at the database instance level.

See Also:

Oracle Database Concepts to learn about the multithreaded Oracle Database model

When the Database Considers IM Dynamic Scans

Lightweight threads are enabled when a CPU resource plan is enabled (for example, RESOURCE_MANAGER_PLAN=DEFAULT_PLAN) and CPU utilization of the database is low.

If lightweight threads are enabled, then the database considers an IM dynamic scan when an application queries an object that is currently populated in the IM column store. Typically, a serial or parallel query is a candidate for IM dynamic scans when it has the following characteristics:

  • Accesses a high number of IMCUs or columns

  • Consumes all rows in the table

  • Is CPU-intensive

Oracle Database Resource Manager (the Resource Manager), which is automatically enabled when INMEMORY_SIZE is greater than 0, is required for IM dynamic scans. The Resource Manager decides when and how to use the lightweight threads. Lightweight threads are the lowest priority operation in the database because they are capitalizing on unused resources.

Note:

CPU_COUNT must be greater than or equal to 24 in order to perform IM dynamic scans.
How IM Dynamic Scans Work

Resource Manager allocates lightweight threads to parallelize the scan of IMCUs.

When the database determines that a query can benefit from an IM dynamic scan, it typically proceeds as follows:

  1. A table scan process spawns a pool of lightweight threads.

  2. The table scan process creates a separate task for every IMCU that must be scanned, and then adds each task to a task queue.

  3. Resource Manager determines how many threads can participate in the table scan.

  4. Active threads pick up tasks from the task queue, with the table scan process consuming results from completed tasks.

Depending on the database load, Resource Manager continuously adjusts the number of active lightweight threads while the query is running. If CPU resources are not available, then the table scan process performs the scan without using lightweight threads.

The following graphic illustrates an IM dynamic scan of 12 IMCUs in the sales table.

In the preceding graphic, the database host has 8 CPU cores. Based on an internal algorithm, Resource Manager assigns 4 threads to assist the table scan process. In this scenario, 4 CPU cores remain idle for other concurrent database operations to use.

Interface for IM Dynamic Scans

IM dynamic scans are transparent, which means that they require no application changes and are automatically controlled by the Resource Manager.

IM dynamic scans require the Resource Manager, which is automatically enabled when INMEMORY_SIZE is greater than 0. No specific resource plan is required.

Several new session statistics track the usage of IM dynamic scans. Each thread writes trace data to a separate trace file.

Execution plans are unchanged. The following figure shows a sample execution plan.

SQL> SELECT MAX(l_quantity) largest_order FROM lineitem;

LARGEST_ORDER
-------------
           50

Elapsed: 00:00:03.41

Execution Plan
----------------------------------------------------------
Plan hash value: 1885658499
-------------------------------------------------------------------------------------------
|Id| Operation                   | Name     |Rows| Bytes |Cost(%CPU)| Time  |Pstart| Pstop|
-------------------------------------------------------------------------------------------
|0| SELECT STATEMENT             |          |    1 |    3| 116K  (4)| 00:00:05 |    |     |
|1|  SORT AGGREGATE              |          |    1 |    3|          |          |    |     |
|2|   PARTITION RANGE ALL        |          | 600M |1716M| 116K  (4)| 00:00:05 |  1 |  84 |
|3|    TABLE ACCESS INMEMORY FULL| LINEITEM | 600M |1716M| 116K  (4)| 00:00:05 |  1 |  84 |
-------------------------------------------------------------------------------------------

NAME                                              VALUE
-------------------------------------------------------
IM scan CUs memcompress for query low              1147
IM scan bytes in-memory                      5.1790E+10
IM scan bytes uncompressed                   7.6722E+10
IM scan CUs columns accessed                       1147
IM scan rows                                  600037902
IM scan rows projected                               29
IM scan (dynamic) rows                        600037902
IM scan (dynamic) multi-threaded scans                1
IM scan (dynamic) tasks processed by thread        1146

Consider the characteristics of the plan:

  1. The execution plan is unchanged.

    Note that the plan does not mention IM dynamic scans in Step 3. However, clicking the binocular icon in a SQL Monitor report would show “Dynamic Scan Tasks on Thread.”

  2. IM scan (dynamic) multi-threaded scans is nonzero, which means that the database used an IM dynamic scan.

  3. IM scan CUs memcompress for query low indicates that 1147 IMCUs exist in the lineitem table.

  4. IM scan (dynamic) tasks processed by thread indicates how many IMCUs were processed in parallel.

    The number is 1146, which is less than the total number of 1147 shown in IM scan CUs memcompress for query low. The database analyzed the first IMCU without parallelization to determine whether parallelization was worthwhile. Because the answer was yes, the database proceeded to scan the remaining 1146 IMCUs in parallel.

  5. IM scan (dynamic) rows and IM scan rows are equal, which means that the threads retrieved all rows for the query.

See Also:

CPU Architecture: SIMD Vector Processing

For data that is populated in the IM column store, the database uses SIMD (single instruction, multiple data) processing.

A SIMD unit is a processor that enables a single instruction to process data as a unit, called a vector, rather than processing data in separate instructions. For example, instead of using a loop to execute four addition operations, SIMD could load the four sets of numbers into vectors and perform one addition operation. SIMD processing is sometimes called vectorization.

The IM column store maximizes the number of column entries that the CPU can load into the vector registers and evaluate. Instead of evaluating each entry in the column one at a time, the database evaluates a set of column values in a single CPU instruction. SIMD vector processing enables the database to scan billions of rows per second.

For example, an application issues a query to find the total number of orders in the sales table that use the promo_id value of 9999. The sales table resides in the IM column store. The query begins by scanning only the sales.promo_id column, as shown in the following diagram:

Figure 2-12 SIMD Vector Processing

Description of Figure 2-12 follows
Description of "Figure 2-12 SIMD Vector Processing"

The CPU evaluates the data as follows:

  1. Loads the first 8 values (the number varies depending on data type and compression mode) from the promo_id column into the SIMD register, and then compares them with the value 9999 in a single instruction

  2. Discards the entries.

  3. Loads another 8 values into the SIMD register, and then continues in this way until it has evaluated all entries.

SIMD and Non-JSON LOBs

Oracle Database 18c provides SIMD vector support for queries involving SQL operators on LOB columns. This section describes LOBs that are not JSON.

When the LOBs are not JSON, the nature of the support depends on the type of LOB:

  • Inline LOBs

    The IM column store provides contiguous storage for inline LOBs, which are LOBs less than 4 KB, within the IMCUs. Columnar storage enables faster query processing by removing the overhead of assembling LOB data from the database buffer cache.

  • Out-of-line LOBs

    In this case, the IM column store only stores the LOB locator, which is 40 byes. Out-of-line columns do not benefit from columnar optimization.

SIMD Access for JSON Data

In-Memory columns can contain JSON data stored in either the JSON data type or in text-based LOB columns.

Regardless of the data type of the In-Memory column (VARCHAR2, CLOB, BLOB, or JSON), JSON data is stored in OSON, which is Oracle's optimized binary JSON format. The OSON format can provide faster query performance using SIMD processing.

See Also:

SIMD and Oracle Numbers

For tables compressed with QUERY LOW, NUMBER columns are encoded using an optimized format that enables native calculations in hardware.

SIMD vector processing enables simple aggregations, GROUP BY aggregations, and arithmetic operations to benefit significantly. The performance improvement depends on the amount of time the aggregation spends on arithmetic computation. Some aggregations may benefit by up to a factor of 9.

SIMD and Exadata Smart Flash Cache

Besides storing data in Hybrid Columnar Compression format, Exadata Smart Flash Cache can store data in pure columnar format.

Exadata Smart Scan supports SIMD predicates. The advantage is that In-Memory performance extends from DRAM storage to secondary storage.

By default, Exadata Smart Flash Cache compresses data using the level MEMCOMPRESS FOR CAPACITY LOW. To change the compression level or disable the columnar format altogether, use the ALTER TABLE ... NO CELLMEMORY statement.