Tune Performance with Excluded Columns

It's quite common for analyses to contain unused columns. Columns that aren't required but still selected significantly impact performance. An easy way to improve performance is to remove any column that isn't required.

Identify Unused Columns

When you include columns in the analysis that are excluded from the view (Year in this example), you impact performance:
  • Increase the volume of data that needs to be retrieved from the database.
  • Increase the number of columns to be retrieved and processed.
  • Force the analysis to compute results at multiple levels of aggregation.

Description of ceal_excluded_year_col.jpg follows
Description of the illustration ceal_excluded_year_col.jpg

For example, this simple report shows the number of customers by region and year.

Description of ceal_report_customers_by_region_year.jpg follows
Description of the illustration ceal_report_customers_by_region_year.jpg

The report that's displayed is a graph showing the number of customers by region. Notice that the Year column is excluded.

Description of ceal_report_graph_customers_by_region.jpg follows
Description of the illustration ceal_report_graph_customers_by_region.jpg

Logical Query with Unused Year Column

Even though the Year column isn't displayed in the view, it's still selected as part of the logical query.

Description of ceal_logical_query_includes_year.jpg follows
Description of the illustration ceal_logical_query_includes_year.jpg

Including the Year column in the analysis has the following impact:
  • Additional columns are retrieved and processed.
  • Additional rows are retrieved and processed, because the number of customer rows is selected not only by region but also by year.
  • Further aggregation is required.

Physical Query with Unused Year Column

If you review the physical query, you can identify areas where performance is impacted.

Description of ceal_-physical_query_performance_impact.jpg follows
Description of the illustration ceal_-physical_query_performance_impact.jpg

This example shows a SELECT count distinct from the customer number aggregation rule. In some situations, it also impacts reports with a sum aggregation rule. The generated query in this scenario also uses a grouping set. At the database level, it might be selecting many rows (millions) and then having to group by Year and Region, as well as Region. This can consume significant, unnecessary database resources.

Remove Unused Columns

Let's remove the Year column and analyze the impact on the logical and physical queries that are generated.

Description of ceal_deleted_col_selection_criteria.jpg follows
Description of the illustration ceal_deleted_col_selection_criteria.jpg

Logical Query After Year Column Removed

Now, the logical query doesn't contain the Year column and, more importantly, the report aggregation is removed.

Description of ceal_logical_query_no_year_or_report_agg.jpg follows
Description of the illustration ceal_logical_query_no_year_or_report_agg.jpg

Physical Query After Year Column Removed

The physical query is much simpler now that it doesn't include grouping sets and the number of records selected is greatly reduced.

Description of ceal_physical_query_no_grouping.jpg follows
Description of the illustration ceal_physical_query_no_grouping.jpg

By reviewing the analysis of non-performant reports, and in the first instance simply removing redundant unused columns, you can achieve significant performance gains. The physical query generated has reduced complexity and fewer records are returned, therefore less processing is required.