[Work Log] Theoretical Rate variance bug; Training background curve model

August 13, 2013
Project Tulips
Subproject Data Association v2
Working path projects/​tulips/​trunk/​src/​matlab/​data_association_2
Unless otherwise noted, all filesystem paths are relative to the "Working path" named above.

Curve reversing thoughts

The reversed curve issue only really matters during training. Our tests show that curve-flip moves will mix well, even if the maximum isn't always correct. Adding connections between parent and child curves should resolve these issues.

During training, derive curve direction from ground-truth.

Theoretical Rate Variance (take 2)

Realized that my last attempt at this had two bugs:

  1. I used rand() instead of randn().
  2. I was normalizing by the squared vector magnitude.

Fixing this:

dir = randn(3,10000000);
dir = bsxfun(@times, dir, 1./sqrt(sum(dir.^2)));
var(dir(:))

    ans =
        0.3332

Compare this to earlier theoretical results of ~0.23.

This new result is interesting, because it is 25% higher than the emperical results we've been getting. I'm guessing that the fact all of the curves point upward reduces the variance. To prove, we'll force all points to be in the top hemishphere:

 dir = randn(3,10000000);
 dir(3,:) = abs(dir(3,:));
 dir = bsxfun(@times, dir, 1./sum(dir.^2));

 var(dir(:))

    ans =
        0.3149

Yep. And in practive, our values take on an even smaller range of directions.

...

Repeating for the 2D case:

dir = randn(2,10000000);
dir = bsxfun(@times, dir, 1./sqrt(sum(dir.^2)));
var(dir(:))

    ans =
        0.5000

This strongly suggests a pattern of variance being 1/D.

Connection test

does connecting each of the curves result in better ML? Do we need to marginalize?

training background model

construct training ML for background ML construct mats for bg curve models.

Result: train/tr_train_bg.m.

     position_mean_2d: [2x1 double]
 position_variance_2d: 1.7837e+04
     rate_variance_2d: 0.5000
    noise_variance_2d: 9.4597e-04
smoothing_variance_2d: 0.0157

Interesting that noise_variance_2d is so low. We expected it to be on the order of 1 pixel. More discussion on this later.

I retrained the BG model for only the foreground curves, and evaluated the ML under it.
position_mean_2d: [2x1 double] position_variance_2d: 5.8116e+03 rate_variance_2d: 0.5000 noise_variance_2d: 4.8105e-04 smoothing_variance_2d: 0.0053

Smaller noise variance, smaller smoothing variance. This shrinking of variance with smaller training set is typical overfitting behavior. Not of much concern.

Here's the comparison against the ML for the trained foreground model.

bg model = 4611.886746
fg model = -8049.097873

Not good. The FG model on true foreground curves should have a better marginal likelihood than the same curves under the BG model.

Some questions

This warrants further investigation.

Other observations

If I force the noise variance to be equal to that of the fg model (0.72), the ML drops significantly (fg results reprinted for convenience):

 bg model = -9413.1e+03
fg model = -8049.097873

Now we're back in business. This is a good sanity check, but it doesn't explain why we can't get similar noise variances for both models when training.

Possibly the smoothing variance would need to change in this case. Retraining, with noise_variance forced to 0.72:

     position_mean_2d: [2x1 double]
 position_variance_2d: 5.8116e+03
     rate_variance_2d: 0.5000
    noise_variance_2d: 0.7204
smoothing_variance_2d: 2.7394e-05

Smoothing variance dropped dramatically. ML comparison (fg results reprinted for convenience):

bg model = -9214.632874
fg model = -8049.097873

Note that bg ML didn't significantly change (-2%) after optimizing noise variance (which did change a lot).

Might be worthwhile visualizing the optimal fits with these parameters. Are we oversmoothing? undersmoothing? These would suggest a bug.

ML Validity Testing

Running reference ml: data_2 = init_data_curves_ml(data_, bg_train_params_done_force) sum([data_2.curves_ml{:}])

    ans =
      -5.1552e+04

Very different from the training implementation. Need to dig deeper to determine the cause.

Misc Thoughts

Do we need to re-estimate the index set during training of the FG model?

Iterate: train, re-index, repeat.

TODO

Posted by Kyle Simek
blog comments powered by Disqus