[Work Log]

August 03, 2013

Project	Tulips
Subproject	Data Association v2
Working path	projects/tulips/trunk/src/matlab/data_association_2

Unless otherwise noted, all filesystem paths are relative to the "Working path" named above.

Running optimizer...

Issue: Likelihood variance is collapsing to zero.

Guess: Training ML is different from naive ML and isn't penalizing noise correctly.

Test 1: save params, compare training ml and naive ml

Same result.

Discussion

The marginal likelihood is monotonically increasing as the noise variance

$\sigma_n^2$ approaches zero. This shouldn't be happening, because as the likelihood variance collapses to a delta, the marginal likelihood should become equal to the prior evaluated at the (virtual) observed locations.

$\begin{align} \lim_{\sigma_n \to 0} p(D) &= \lim_{\sigma_n \to 0} \int p_0(x) \; p_l(D \mid x) dx & \text{(Marginalization)}\\ &= \lim_{\sigma_n \to 0} \int p_0(x) \; f(D - x) dx & \text{(Convolution)} \\ &= \int p_0(x) \delta(D - x) dx \\ &= p_0(D) \end{align}$

This implies there's a bug in the code causing this phenomenon; the mathematical model is not the cause.

Solved

Found the bug. When computing the likelihood precision

$S$ from the unscaled precision

$S_0$ , the noise variance

$\sigma_n^2$ was multiplied instead of divided:

S = S0 * sigma_n; // incorrect
S = S0 * (1/sigma_n); // corrected version

Resuming Training

Had some trouble with noise sigmas being too low. Solved by in two ways:

Attempts 1-3: clamping to a minimum value and then adding a penalty depending on the amount that was clamped.
Attempt 4: offset by minimum value before transforming

Command:

train_params_done = tr_train(test_Corrs_ll, train_params, data_, 400);

Attempt #1:

Handle extreme values by incuring a penalty for variances smaller than 1e-5. Result stored in train_params_done_1.

train_params_done_1 = 

            smoothing_variance: 0.0025
                noise_variance: 0.1231
             position_variance: 1.6676e+04
                 rate_variance: 0.2212
    perturb_smoothing_variance: 1
         perturb_rate_variance: 1
     perturb_position_variance: 1
                 perturb_scale: 1

Surprisingly small smoothness sigma. Surprisingly low rate variance.

Attempt #2

Testing sensitivity to the magnitude of the penalty term. Scaled up penalty by 1000. Results in train_params_done_2:

train_params_done_2 = 

            smoothing_variance: 0.0025
                noise_variance: 0.1231
             position_variance: 1.6676e+04
                 rate_variance: 0.2212
    perturb_smoothing_variance: 1
         perturb_rate_variance: 1
     perturb_position_variance: 1
                 perturb_scale: 1

Summary: no change.

Attempt #3

Testing sensitivity to the threshold where the penalty term starts to be incurred. Set MIN_NOISE_VARIANCE to 1e-3 and MIN_SMOOTHING_VARIANCE to 1e-7 (previously both 1e-5). Results in train_params_done_3:

train_params_done_3 = 

            smoothing_variance: 0.0025
                noise_variance: 0.1231
             position_variance: 1.6676e+04
                 rate_variance: 0.2212
    perturb_smoothing_variance: 1
         perturb_rate_variance: 1
     perturb_position_variance: 1
                 perturb_scale: 1

Summary: no change.

Attempt #4 Handled small variances by offsetting before transforming:

% converting param to state variable
x(1) = log(smoothing_variance - MIN_SMOOTING_VARIANCE);

% converting state variable to param
x(1) = exp(smoothing_variance) + MIN_SMOOTING_VARIANCE;

This is more elegant than the penalty hack used in the last three attempts. Results:

train_params_done = 

            smoothing_variance: 0.0025
                noise_variance: 0.1231
             position_variance: 1.6676e+04
                 rate_variance: 0.2212
    perturb_smoothing_variance: 1
         perturb_rate_variance: 1
     perturb_position_variance: 1
                 perturb_scale: 1

Summary: no change.

No-Perturb model summary:

Successfully trained model.
Required 115 function evaluations, taking 112 s.
Optimal marginal likelihood: 23714.937760.
Optimal parameters:

            smoothing_variance: 0.0025
                noise_variance: 0.1231
             position_variance: 1.6676e+04
                 rate_variance: 0.2212

Surprised at how small noise_variance was, considering the calibration noise. However, I guess the maximum-likelihood reconstruction looked pretty good, if unity-apect-ratio axis scaling is used.

Training OU-Perturb model

Getting weird results. Halted after second iteration; second iteration took 250 function evaluations; perturb_smoothing_variance gradient is zero. Results:

train_params_done = 

            smoothing_variance: 1.5003e-05
                noise_variance: 1.0087e-04
             position_variance: 7.9393e+04
                 rate_variance: 0.7879
    perturb_smoothing_variance: 0.2500
         perturb_rate_variance: 1.5183
     perturb_position_variance: 2.5159e+04
                 perturb_scale: 48.6163

Recall that new model ML's aren't deeply tested; probably a bug in there (in the kernel implementation? in the kernel theory? in the conversion from mats to kernel?). Will continue tomorrow.

Posted by Kyle Simek

→ ←

KLS Research Blog ← Nothing to see here...

[Work Log]

Discussion

Solved

Resuming Training

No-Perturb model summary:

Training OU-Perturb model