[Work Log]

August 03, 2013
Project Tulips
Subproject Data Association v2
Working path projects/​tulips/​trunk/​src/​matlab/​data_association_2
Unless otherwise noted, all filesystem paths are relative to the "Working path" named above.

Running optimizer...

Issue: Likelihood variance is collapsing to zero.

Guess: Training ML is different from naive ML and isn't penalizing noise correctly.

Test 1: save params, compare training ml and naive ml

Same result.

Discussion

The marginal likelihood is monotonically increasing as the noise variance \(\sigma_n^2\) approaches zero. This shouldn't be happening, because as the likelihood variance collapses to a delta, the marginal likelihood should become equal to the prior evaluated at the (virtual) observed locations.
\begin{align} \lim_{\sigma_n \to 0} p(D) &= \lim_{\sigma_n \to 0} \int p_0(x) \; p_l(D \mid x) dx & \text{(Marginalization)}\\ &= \lim_{\sigma_n \to 0} \int p_0(x) \; f(D - x) dx & \text{(Convolution)} \\ &= \int p_0(x) \delta(D - x) dx \\ &= p_0(D) \end{align}

This implies there's a bug in the code causing this phenomenon; the mathematical model is not the cause.

Solved

Found the bug. When computing the likelihood precision \(S\) from the unscaled precision \(S_0\), the noise variance \(\sigma_n^2\) was multiplied instead of divided:
S = S0 * sigma_n; // incorrect
S = S0 * (1/sigma_n); // corrected version

Resuming Training

Had some trouble with noise sigmas being too low. Solved by in two ways:

  1. Attempts 1-3: clamping to a minimum value and then adding a penalty depending on the amount that was clamped.
  2. Attempt 4: offset by minimum value before transforming

Command:

train_params_done = tr_train(test_Corrs_ll, train_params, data_, 400);

Attempt #1:

Handle extreme values by incuring a penalty for variances smaller than 1e-5. Result stored in train_params_done_1.

train_params_done_1 = 

            smoothing_variance: 0.0025
                noise_variance: 0.1231
             position_variance: 1.6676e+04
                 rate_variance: 0.2212
    perturb_smoothing_variance: 1
         perturb_rate_variance: 1
     perturb_position_variance: 1
                 perturb_scale: 1

Surprisingly small smoothness sigma. Surprisingly low rate variance.

Attempt #2

Testing sensitivity to the magnitude of the penalty term. Scaled up penalty by 1000. Results in train_params_done_2:

train_params_done_2 = 

            smoothing_variance: 0.0025
                noise_variance: 0.1231
             position_variance: 1.6676e+04
                 rate_variance: 0.2212
    perturb_smoothing_variance: 1
         perturb_rate_variance: 1
     perturb_position_variance: 1
                 perturb_scale: 1

Summary: no change.

Attempt #3

Testing sensitivity to the threshold where the penalty term starts to be incurred. Set MIN_NOISE_VARIANCE to 1e-3 and MIN_SMOOTHING_VARIANCE to 1e-7 (previously both 1e-5). Results in train_params_done_3:

train_params_done_3 = 

            smoothing_variance: 0.0025
                noise_variance: 0.1231
             position_variance: 1.6676e+04
                 rate_variance: 0.2212
    perturb_smoothing_variance: 1
         perturb_rate_variance: 1
     perturb_position_variance: 1
                 perturb_scale: 1

Summary: no change.

Attempt #4 Handled small variances by offsetting before transforming:

% converting param to state variable
x(1) = log(smoothing_variance - MIN_SMOOTING_VARIANCE);

% converting state variable to param
x(1) = exp(smoothing_variance) + MIN_SMOOTING_VARIANCE;

This is more elegant than the penalty hack used in the last three attempts. Results:

train_params_done = 

            smoothing_variance: 0.0025
                noise_variance: 0.1231
             position_variance: 1.6676e+04
                 rate_variance: 0.2212
    perturb_smoothing_variance: 1
         perturb_rate_variance: 1
     perturb_position_variance: 1
                 perturb_scale: 1

Summary: no change.

No-Perturb model summary:

Successfully trained model.
Required 115 function evaluations, taking 112 s.
Optimal marginal likelihood: 23714.937760.
Optimal parameters:

            smoothing_variance: 0.0025
                noise_variance: 0.1231
             position_variance: 1.6676e+04
                 rate_variance: 0.2212

Surprised at how small noise_variance was, considering the calibration noise. However, I guess the maximum-likelihood reconstruction looked pretty good, if unity-apect-ratio axis scaling is used.

Training OU-Perturb model

Getting weird results. Halted after second iteration; second iteration took 250 function evaluations; perturb_smoothing_variance gradient is zero. Results:

train_params_done = 

            smoothing_variance: 1.5003e-05
                noise_variance: 1.0087e-04
             position_variance: 7.9393e+04
                 rate_variance: 0.7879
    perturb_smoothing_variance: 0.2500
         perturb_rate_variance: 1.5183
     perturb_position_variance: 2.5159e+04
                 perturb_scale: 48.6163

Recall that new model ML's aren't deeply tested; probably a bug in there (in the kernel implementation? in the kernel theory? in the conversion from mats to kernel?). Will continue tomorrow.

Posted by Kyle Simek
blog comments powered by Disqus