Despite all of the elaborate scaffolding in the pipeline, we're essentially solving a chicken-and-egg problem, trying to find:
We currently use a multi-step approach to estimate both using elaborate heuristics, and the discussion over the last week or shows that this approach is less than satisfying. The current approach (roughly) is
Steps 1 and 2 get initial 3D estimate.
Steps 3, 4, 5, 6 try to optimize the posterior reconstruction
Finally, in step 7, we get our first estimate of point-indices. Step 8 gives us point-depths.
Forcing a point correspondence is unnatural, since points lie on a continuous index-set, don't correspond exactly. Jagged-tails cause correspondence failures, which prevent triangulation, so we need special code to handle these after the fact.
Index estimates don't account for between-view perturbations; this rigidity causes bad index estimates when perturbations are large.
Index estimation in 2D can be ambiguous; nearby points in 2D can be far in 3D.
Index estimation using DTW algorithm doesn't handle smoothness.
Each stage is full of hacks.
Is there a better way?
Revised algorithm:
In the first pass, XXX can be a "scene center" point, or some heuristic (steps 1 and 2 above). In future passes, the maximum posterior can be used.
Getting the indices right is essential, more important than getting the depths early on, since the depths have infinite variance in the backprojection direction.
Index optimization will use Quasi-newton optimization, a local minimizer. Thus, a reasonable initial estimate is needed. Hopefully, a simple linear map between endpoints should suffice, otherwise one of the existing heuristics could be used. Currently have a gradient for the Newton algorithm, but need to implement hessian. Hessian should be implementable in \(O(n3)\).
For speed, this procedure could be applied incrementally to get reasonable initial estimates when adding an extra curve to the track, just skip re-estimating depth and indices of existing points. If accepted, re-estimate all points and re-accept.
After some playing around, I'm nearly certain the Hessian can be computed in cubic time (i.e. same as gradient)!
The first derivative (gradient) has two terms: the exponential term and the partition function term. Thus, the hessian has two terms, and a cubic-time formula for the first is relatively easy to derive.
The hessian of the partition function is much messier, but we can boil it down to the following statement: each element of the hessian is the trace of a vector outer-product, which is equivalent to a dot product of the same two vectors. Assuming we can obtain these vectors in ammortized linear time, we can compute all elements of the hessian in cubic time.
It remains to show that we compute all the vectors in ammortized linear time. Each vector takes O(n2) to compute (multiplication of a nxn matrix and a nx1 vector). There are O(n) different vectors in total which can then be cached, for a total of O(n3) precomputation. Divided over the n2 different times we use these vectors, the per-iteration ammortized running time is O(n).
It's going to be a bear implementing this, but the payoff will be huge. First, we'll get efficient local optimization with each iteration costing the same speed as an MCMC iteration (implementeed naively) would take. Second, we can finally properly (approximately) integrate out the index set in our marginal likelihood computation, using the hessian and the Candidate's estimator.
Posted by Kyle Simek