In yesterday's test, I hadn't realized that the training ML didn't include all of the curves, only the foreground curves were included. Rerunning:
Reference ML: -5.1552e+04
Training ML: -5.0786e+04
Okay, we're back in the ballpark -- Within 1.5% of the reference.
Found the other issue: the roundabout way I was using to generate the training covariance matrices was ignoring the user-specified position_variance_2d. Results now match:
Reference ML: -5.1552e+04
Training ML: -5.1552e+04
Unfortunately, none of this explains why training is causing noise variance to collapse so low. All discovered problems were merely bugs in the validation logic. At least we know the training ML logic is valid.
...
Quick inspection shows that 2D curves are, indeed, pre-smoothed. This means that noise variance can collapse to near-zero when fitting 2D curve.
The new question becomes: why doesn't it occur in the foreground (3D) model?
...
Try re-indexing...
% update params related to smoothing variance
params_2 = tbc_.params;
params_2.smoothing_variance = training_results{1}.smoothing_variance;
params_2.noise_variance = training_results{1}.noise_variance;
% re-construct likelihood (including indicies)
test_Corrs_ll_2 = tr_prep_likelihood(test_Corrs, data_, params_2);
% re-run training
train_params_done = tr_train(test_Corrs_ll_2, training_results{2}, 400, 3);
Results
smoothing_variance: 0.0011
noise_variance: 0.6846
position_variance: 1.3597e+04
rate_variance: 0.3042
perturb_smoothing_variance: 7.1854e-19
perturb_rate_variance: 1.3013e-06
perturb_position_variance: 0.6621
perturb_scale: 2.9795
Final ML: -7840.140322
Compare against results prior to re-indexing: version:
smoothing_variance: 0.0019
noise_variance: 0.7204
position_variance: 1.6111e+04
rate_variance: 0.2465
perturb_smoothing_variance: 7.1854e-19
perturb_rate_variance: 1.1296e-06
perturb_position_variance: 0.5931
perturb_scale: 2.4654
Final ML: -8049.9e+03
So we've improved, but nowhere near the background model's ML of 4611.9. Note that curves got smoother and less noisy. More correlation, more variance pushed into the perturbations. (why the f*** perturb_smoothing_variance is just sitting there like an idiot is still beyond me).
Need to visualize ll_means against smoothed curve. The perturbations should be correlated. Maybe even plot them? Note the perturbations need to be considered only in the directions parallel to the image plane. This has never been done, and is necessary to validate the index-estimation in correspondence/corr_to_likelihood
. Can we visualize after removing rate and offset components? Yes: difference between ll_means and per-view reconstructed curve.
Consider smarter smoothing in corr_to_likelihood
-- using posterior max instead of csaps
. Could give better index estimation if the visualization test shows problems.
What if we were using a 3D likelhood for background curves too? Could we still expect the BG ML to be insanely high, and the noise variance to be insanely peaked? Backproject, estimate 3d noise variance, estimate index set. The farther away it gets, the more variance in the correlated points. Which means lower ML, right? But training will push variance lower.
Note that larger noise variance in FG model will explain away bad triangulations. Perturb variances also explain to some extent, but maybe they aren't sufficient to explain enough of it.
Why is perturb_smoothing_variance basically zero? If it was higher, it could explain more of the traingulation error, and allow noise variance to drop. Should we be using a different pertubation model? Maybe Brownian motion instead of integrated brownian motion? Visualizing perturbations would be informative here.
Do smooth perturbations follow a different dynamics model than linear and offset perturbations? Can we force it to be larger? what if it was the only option for modelling perturbations? It's true that a small amount of smoothing variance can result in a huge amount of marginal point variance, and large variances kill ML's. Probably a mean-reverting model is more sensible -- Ornstein Ulenbeck process, perhaps? Or SqExp? I avoided these in the past, because it changes the form of the marginal curve covariances -- they're no longer purely cubic-spline processes. But I never considered the fact that we need to model triangulation error.
Observations: setting perturb_smoothing_variance to exactly zero has no change in ML.
Consider tying foreground and background noise variance during training. <--- This is the most pragmatic solution. Avoids getting mired in details, and acknowledges what we know to be true: image noise arises from the same process in foreground and background models.
Possibly the fact that we allow a nonzero position_mean in 2D but not in 3D is the issue?
Got it! In corr_to_likelihood.m
, we have two parameters that determine how fine-grained the sampling is along the smoothed curve. Each observed curve is then matched against the sampled curve. The sampling period is 2 pixels, which means there's an average error of about 1 pixel in each index estimate.
Reducing the sampling period to 1 pixel and re-training gives:
smoothing_variance: 0.0014
noise_variance: 0.3986
position_variance: 1.3555e+04
rate_variance: 0.3100
perturb_smoothing_variance: 7.1854e-19
perturb_rate_variance: 3.3992e-04
perturb_position_variance: 0.6874
perturb_scale: 2.3823
Final ML: -6357.585946
Although we only slightly changed the sampling period, the final ML improved significantly. The noise variance dropped from 0.7 to 0.4, too. Perturb rate variance changed by 2 orders of magnitude!
Reducing sampling period to 0.5:
smoothing_variance: 0.0018
noise_variance: 0.3060
position_variance: 1.3499e+04
rate_variance: 0.3098
perturb_smoothing_variance: 7.1854e-19
perturb_rate_variance: 4.3411e-04
perturb_position_variance: 0.8152
perturb_scale: 2.4095
Final ML: -5664.35
The upward ML trend continues, but the noise variance appears to be flattening out. The perturb_position_variance jumped upward unexpectedly.
This might explain all of the disparity between the 2D and 3D noise variances. Unfortunately, we can't reduce the sampling period to 0.0004, because the runtime complexity of the matching is O(N2), where N is the number of sampled points.
Better idea: after finding the optimal region in the matching algorithm, improve it by projecting the point onto the line segments neighboring the matched point. Constant-time, and significantly better!
Another thought: if the rasterization error was approx. 1 pixel, the sampling error could be reduced to the point where the rasterization error dominated (possibly a sampling period of 0.5 or 0.01 would achieve this). That way, both 2D and 3D noise sigma would be dominated by rasterization error, and would train to similar values.