Bad Data Detection

DSE includes bad data detection to identify questionable measurements. This is a two-step process by default. First, when DSE is done it performs a source-wide statistical consistency test using a chi-squared distribution. If this check passes, we move on. If it fails, we perform a per-measurement check for outliers. The first step’s result can be ignored by DSE_ALWAYS_COMPUTE_NORM_RES.

Detecting if Bad Data Exists

To test that bad data exists on the source. The following steps are performed:

• Test statistic: J = resT W res, the weighted sum of squared residuals at the final solution, where res are measurement residuals (measured – estimated) and W is the weighting matrix (our weights squared).

• Degrees of freedom (dof): dof = M - N_eff, where M is the number of used measurements (excluding pseudo/zero-injection entries that participate in the fit) and N_eff is the effective number of estimated state variables. Disabled measurements are excluded.

• Distribution: Under standard assumptions (independent, zero-mean normal noise consistent with configured errors), J approximately follows a chi-squared distribution with dof degrees of freedom.

• Decision rule:

• Compute the critical value c = chi2_inverse(p_threshold, dof), where p_threshold is configured by the rule DSE_BADDATA_PVAL.

• If J > c, the global test fails (the residual set is inconsistent at the configured level), and DSE marks the source as having suspected bad data.

• If J ≤ c, the global test passes.

• For example:

• Suppose dof=500 and DSE_BADDATA_PVAL = 0.95.

• Then the critical value c = chi2_inverse(0.95, 500) is approximately 553.

• If J = 610, then J > C and the test fails(bad data is suspected).

• If J = 480, then J<C and the test passes.

Detecting Which Data is Likely Bad

The chi-squared test is global only. To identify which measurements are bad we utilize a largest normalized residual test (also called an internally studentized residual). We compute an estimate of the residual variance and divide by it. Thus, normalizing the residuals to be: z = |r|/SigmaEst, which is of the form of a standard normal distribution. Specifically, we compute:

• For a residual vector: r and weights w(our 1/sigma from What Is State Estimation).

• Compute the measurement Jacobian J (the same as the full Jacobian but restricted to measurement rows).

• Compute the Gain Matrix: G = JTW2J, where W = diag(w)

• Compute the Hat Matrix: H = J G-1JTW2

• The residual sensitivity matrix is then: S = I – H (Clamp the diagonal of H to be between 0 and 1)

• Then the residual Covariance matrix is: Ω = S/W2

• And finally the normalized residuals: norm_resi=|ri|/sqrt(Ωi,i)

• If norm_resi> DSE_BADDATA_RES_TOL, the measurement is marked as bad.

Note: If we compute a Covariance that is too small, we flag the measurement as bad with a value of 1e+10.

We report all the normalized residuals and bad data flags in the results tab of the State Estimation tool.