Data Preparation Procedures

This section describes the data preparation procedures used by Oracle Utilities to transform the provided raw dataset into a panel dataset which can be entered into a regression model.

Estimated Read True-Up

Some utilities estimate usage for a billing period to save on operational costs. A subsequent read provides the actual usage for the billing period. To increase measurement accuracy, Oracle Utilities trues-up original usage estimates by first finding the estimate correction, which is the difference between the actual read and estimated read. Then, the estimate correction is added to the original estimate to get actual usage for the billing period. See the image below for more information.

  1. The bill for August 15 through September 15 is based on estimated usage.
  2. The actual read on October 15 consists of actual usage over the period of September 15 through October 15 plus an estimate correction.

In the image above, truing-up is the function of adding the usage over periods A and B to obtain actual usage from August 15 through October 15.

Because Oracle Utilities does not know if a customer’s first non-estimated read in the raw billing dataset includes an estimate correction to true-up prior estimates, the first non-estimated bill for each customer and all estimated reads prior to that bill are not included in the prepared dataset.

Back to Top

Billing Calendarization

Calendarization is the process of pro-rating billing data into calendar months. Calendarization can smooth out billing data with read durations longer than one month or billing data with a significant percentage of estimated reads.

For example, a 30-day bill dated August 15 includes usage that occurred in the second half of July, but does not include usage from the second half of August. The image below demonstrates the span of this billing period.

Previously, all usage in this period would be attributed to August for savings calculations, as shown in the image below.

Calendarization spreads usage into the calendar month in which it took place. Using calendarization, 15 out of the 30 days of usage are attributed to July and the remaining 15 days of usage are attributed to August. The image below shows how the process of calendarization affects the calendar month to which usage data is attributed.

Back to Top

Trim Billing

After calendarization, Oracle Utilities trims usage data to exclude outliers and remove inapplicable data. The following criteria is used to identify outliers:

  • Usage occurring after the customer move out date.
  • Billing data with duration less than 1 day or more than 31 days (to trim months with overlapping meter reads).
  • Usage less than -300 kWh per day, greater than 300 kWh per day, less than -50 therms per day, or greater than 50 therms per day.
  • The most recent partial month. For example, if analysis is run in December, only usage through November is included in analysis.
  • Usage occurring more than 12 months prior to the program start date.

Back to Top

Average Pre-Treatment Usage Variables

The regression model uses billing data in the pre-treatment period to create three regression coefficients.

  • Average usage per day: The average usage for a customer in the pre-treatment period (any meter reads that end prior to the treatment_start_date).
  • Average usage per day in summer: The average usage for a customer in the pre-treatment period over the summer months, defined as June through September.
  • Average usage per day in winter: The average usage for a customer in the pre-treatment period over the winter months, defined as December through March.

Back to Top