Bad Data Detection
DSE includes bad data detection to identify questionable measurements. This is a two-step process by default. First, when DSE is done it performs a source-wide statistical consistency test using a chi-squared distribution. If this check passes, we move on. If it fails, we perform a per-measurement check for outliers. The first step’s result can be ignored by DSE_ALWAYS_COMPUTE_NORM_RES.
Detecting if Bad Data Exists
To test that bad data exists on the source. The following steps are performed:
Test statistic: J = resT W res, the weighted sum of squared residuals at the final solution, where res are measurement residuals (measured – estimated) and W is the weighting matrix (our weights squared).
Degrees of freedom (dof): dof = M - N_eff, where M is the number of used measurements (excluding pseudo/zero-injection entries that participate in the fit) and N_eff is the effective number of estimated state variables. Disabled measurements are excluded.
Distribution: Under standard assumptions (independent, zero-mean normal noise consistent with configured errors), J approximately follows a chi-squared distribution with dof degrees of freedom.
Decision rule:
Compute the critical value c = chi2_inverse(p_threshold, dof), where p_threshold is configured by the rule DSE_BADDATA_PVAL.
If J > c, the global test fails (the residual set is inconsistent at the configured level), and DSE marks the source as having suspected bad data.
If J ≤ c, the global test passes.
For example:
Suppose dof=500 and DSE_BADDATA_PVAL = 0.95.
Then the critical value c = chi2_inverse(0.95, 500) is approximately 553.
If J = 610, then J > C and the test fails(bad data is suspected).
If J = 480, then J<C and the test passes.
Detecting Which Data is Likely Bad
The chi-squared test is global only. To identify which measurements are bad we utilize a largest normalized residual test (also called an internally studentized residual). We compute an estimate of the residual variance and divide by it. Thus, normalizing the residuals to be: z = |r|/SigmaEst, which is of the form of a standard normal distribution. Specifically, we compute:
For a residual vector: r and weights w(our 1/sigma from What Is State Estimation).
Compute the measurement Jacobian J (the same as the full Jacobian but restricted to measurement rows).
Compute the Gain Matrix: G = JTW2J, where W = diag(w)
Compute the Hat Matrix: H = J G-1JTW2
The residual sensitivity matrix is then: S = I – H (Clamp the diagonal of H to be between 0 and 1)
Then the residual Covariance matrix is: Ω = S/W2
And finally the normalized residuals: norm_resi=|ri|/sqrt(Ωi,i)
If norm_resi> DSE_BADDATA_RES_TOL, the measurement is marked as bad.
Note: If we compute a Covariance that is too small, we flag the measurement as bad with a value of 1e+10.
We report all the normalized residuals and bad data flags in the results tab of the State Estimation tool.