1.2.8 Exadata Hybrid Columnar Compression

Exadata Hybrid Columnar Compression stores data using column organization, which brings similar values close together and enhances compression ratios.

Using Exadata Hybrid Columnar Compression, data is organized into sets of rows called compression units. Within a compression unit, data is organized by column and then compressed. Each row is self-contained within a compression unit.

Database operations work transparently against compressed objects, so no application changes are required. The database compresses data manipulated by any SQL operation, although compression levels are higher for direct path loads.

You can specify the following types of Exadata Hybrid Columnar Compression, depending on your requirements:

  • Warehouse compression: This type of compression is optimized for query performance, and is intended for data warehouse applications.
  • Archive compression: This type of compression is optimized for maximum compression levels, and is intended for historic data and data that does not change.

Assume that you apply Exadata Hybrid Columnar Compression to a daily_sales table. At the end of every day, the table is populated with items and the number sold, with the item ID and date forming a composite primary key. A row subset is shown in the following table.

Table 1-1 Sample Table daily_sales

Item_ID Date Num_Sold Shipped_From Restock

1000

01-JUN-07

2

WAREHOUSE1

Y

1001

01-JUN-07

0

WAREHOUSE3

N

1002

01-JUN-07

1

WAREHOUSE3

N

1003

01-JUN-07

0

WAREHOUSE2

N

1004

01-JUN-07

2

WAREHOUSE1

N

1005

01-JUN-07

1

WAREHOUSE2

N

The database stores a set of rows in an internal structure called a compression unit. For example, assume that the rows in the previous table are stored in one unit. Exadata Hybrid Columnar Compression stores each unique value from column 4 with metadata that maps the values to the rows. Conceptually, the compressed value can be represented as:

WAREHOUSE1WAREHOUSE3WAREHOUSE2

The database then compresses the repeated word WAREHOUSE in this value by storing it once and replacing each occurrence with a reference. If the reference is smaller than the original word, then the database achieves compression. The compression benefit is particularly evident for the Date column, which contains only one unique value.

As shown in the following illustration, each compression unit can span multiple data blocks. The values for a particular column may or may not span multiple blocks.

Exadata Hybrid Columnar Compression has implications for row locking. When an update occurs for a row in an uncompressed data block, only the updated row is locked. In contrast, the database must lock all rows in the compression unit if an update is made to any row in the unit. Updates to rows using Exadata Hybrid Columnar Compression cause rowids to change.

Note:

When tables use Exadata Hybrid Columnar Compression, Oracle DML locks larger blocks of data (compression units) which may reduce concurrency.

Oracle Database supports four methods of table compression.

Table 1-2 Table Compression Methods

Table Compression Method Compression Level CPU Overhead Applications

Basic compression

High

Minimal

DSS

OLTP compression

High

Minimal

OLTP, DSS

Warehouse compression

Higher (compression level depends on compression level specified (LOW or HIGH))

Higher (CPU overhead depends on compression level specified (LOW or HIGH))

DSS

Archive compression

Highest (compression level depends on compression level specified (LOW or HIGH))

Highest (CPU overhead depends on compression level specified (LOW or HIGH))

Archiving

Warehouse compression and archive compression achieve the highest compression levels because they use Exadata Hybrid Columnar Compression technology. Exadata Hybrid Columnar Compression technology uses a modified form of columnar storage instead of row-major storage. This enables the database to store similar data together, which improves the effectiveness of compression algorithms. Because Exadata Hybrid Columnar Compression requires high CPU overhead for DML, use it only for data that is updated infrequently.

The higher compression levels of Exadata Hybrid Columnar Compression are achieved only with data that is direct-path inserted. Conventional inserts and updates are supported, but result in a less compressed format, and reduced compression level.

The following table lists characteristics of each table compression method.

Table 1-3 Table Compression Characteristics

Table Compression Method CREATE/ALTER TABLE Syntax Direct-Path Insert DML

Basic compression

COMPRESS [BASIC]

COMPRESS and COMPRESS BASIC are equivalent

Yes

Yes

Note: Inserted and updated rows are uncompressed.

OLTP compression

COMPRESS FOR OLTP

Yes

Yes

Warehouse compression

COMPRESS FOR QUERY [LOW|HIGH]

Yes

Yes

High CPU overhead.

Note: Inserted and updated rows go to a block with a less compressed format and have lower compression level.

Archive compression

COMPRESS FOR ARCHIVE [LOW|HIGH]

Yes

Yes

Note: Inserted and updated rows are uncompressed. Inserted and updated rows go to a block with a less compressed format and have lower compression level.

The COMPRESS FOR QUERY HIGH option is the default data warehouse compression mode. It provides good compression and performance. The COMPRESS FOR QUERY LOW option should be used in environments where load performance is critical. It loads faster than data compressed with the COMPRESS FOR QUERY HIGH option.

The COMPRESS FOR ARCHIVE LOW option is the default archive compression mode. It provides a high compression level and good query performance. It is ideal for infrequently-accessed data. The COMPRESS FOR ARCHIVE HIGH option should be used for data that is rarely accessed.

A compression advisor, provided by the DBMS_COMPRESSION package, helps you determine the expected compression level for a particular table with a particular compression method.

You specify table compression with the COMPRESS clause of the CREATE TABLE command. You can enable compression for an existing table by using these clauses in an ALTER TABLE statement. In this case, only data that is inserted or updated is compressed after compression is enabled. Similarly, you can disable table compression for an existing compressed table with the ALTER TABLE...NOCOMPRESS command. In this case, all data that was already compressed remains compressed, and new data is inserted uncompressed.