The original plan was to markov-decompose the prior and posterior, but soon remembered that I also need to compute the mean of the posterior, which isn't as straighforward to compute.
After some deliberation and whiteboarding, I decided to implement Bishop's forward/backward algorithm to simultaneously marginalize and compute the maximum. Will test this against the clique-tree implementation to ensure correctness.
........
During testing I realized a fundamental problem with the current likelihood: there is no obvious way to consistently handle the "redundant" dimensions that arise from duplicated indices. In some kernels, these are handled naturally, namely the kernels that treat different views as different indices. The old kernel, however, does not, and the duplicated indices result in degenerate posterior distributions unless handled using hacks.
We could handle this on a kernel-by-kernel basis, but it would be unmaintainable, and could hinder further research into new kernels. It also isn't clear that the hacks I have in mind would actually give correct results.
This problem is inherent to the "candidates estimator" of the marginal likelihood, because it involves a ratio of the posterior and prior, both of which are degenerate in these cases.
The actual marginal likelihood function has no such degeneracies, because each observation is independent, given the underlying curve. However, until recently it was unclear how to evaluate the marginal likelihood using the approximated likelihood function. The approximate likelihood has a rank-deficient precision matrix, and its normalization constant is non-standard due to the transformation from 2D to 3D. However, I think I've developed a way to evaluate it, which I describe in the next article.
Posted by Kyle Simek