Processing math: 100%

[Reference] Graph matching with epipolar constraints

December 29, 2014

What follows is a variation on the work of Serradell et al. (CVPR 2012), which we modify to introduce an epipolar constraint and use a covariance function based on geodesic distance. The epipolar constraint requires significant rework of the original derivation, which assumes the x and y dimension are independent. The use of geodesic distance is necessary to handle parallax motion not present in Serradell's data.

See also (Yu, Tian, Liu, 2007) for an earlier application of GP for point matching, including expressions for marginal likelihood derivatives for training.

Consider a 3D tree structure observed in two views as 2D trees. We seek a correspondence between points in the 2D trees. We treat this as a problem of matching graphs embedded in R2, which we solve using nonlinear Gaussian process regression, using epipolar constraints to ensure the result is consistent with the known camera configuration.

Consider a embedded graph in two views, A and B, with vertex positions XA={xA1,,xAMA} and XB={xB1,,xBMB}. We model the graph in B as arising from an affine transform of the graph in A plus a nonlinear deformation. The goal is to find a correspondence between the graph vertices that minimizes the distortion between them. Because the graph structures may be noisy, we seek an approach that is robust to graphs with minor differences in topology.

Epipolar constraints

We can use the known relationship between two cameras to constrain we expect points XA to appear in view B. The fundamental matrix F between view A and view B is a 3x3 matrix that constrains points in XA to lie on their epipolar line in view B. Because we expect both the points and the fundemental matrix to include some error, we relax this constraint and assume each point lies near its epipolar line with variance σ2e.

For any point x in view A, the epipolar line in view B is given by l=FT[x1]. The normal vector perpendicular to the epipolar line is ˆn=[l1l2]/[l1l2]. The penalized epipolar constraint can be expressed as a Gaussian distribution N(e,Σe), where e is the epipole in view B, and Σ1e=(ˆnˆn)/σ2e. Note that the precision matrix Σ1e is rank-deficient, representing infinite variance along the epipolar line.

The joint likelihood of all points XA={xA1,,xAM} with epipolar normals {ˆn1,,ˆnM} is N(μE,ΣE), where

μE=[ee]Σ1E=SSS=(1/σe)(ˆn1000ˆn2000ˆnM).

Nonrigid deformation prior

We assume the motion of points from view A to view B arise as an affine transformation plus some nonrigid deformation. The main cause of nonrigid deformation is parallax motion, but other less predictable factors like camera lens distortion of small scene movements are possible too.

Assuming motion in the x and y direction are independent, single-dimensional prior covariance between the ith and jth points is

k(i,j)=θ0+θ1xTixj+θ2exp{θ32xixj2}+θ4exp{θ52d(vi,vj)}.(1)

Here, d(vi,vj) is the geodesic graph distance, i.e. the closest path between vertices vi and vj. The first three terms are the same as Serradell et al. -- the first two terms define an affine transformation and the third is a nonlinear distortion term that penalizes local geometry in A being distorted in B. However, we recognize that points with dissimilar 2D position may appear similar in 2D due to projection. The significant parallax shift between such points violates the locality assumption of the third covariance term. The fourth term in (1) introduces deformation based on geodesic distance. We observe that relative geodesic distances are better preserved under projection than relative Euclidean distances. This term provides a better explanation for parallax shift between points at similar Euclidean positions but dissimilar positions in the graph.

Equation (1) applies to one-dimensional points; we need an expression relating two-dimensional points. The x and y dimensions are independent under the prior, so the covariance matrix over 2D points is simply a block-diagonal matrix of two covariance matrices over 1D points. We permute this matrix so dimensions are ordered first by increasing point index, then spatial dimension. Formally, let XI and XJ be the vertical concatenation of 2D points with indices I={i1,,im} and J={j1,,jn}. Define k(I,J) as the gram matrix of covariances k(i,j) for each pair (i,j)I×J. We define the prior covariance between point sets XI and XJ as

K(I,J)=k(I,J)I2,

where denotes Kronecker product and I2 is a 2x2 identity matrix. The result is a 2|I|×2|J| covariance matrix over a sequence of stacked 2D variables with independent dimensions.

Correspondence matching

To estimate correspondences between XA and XB, we follow the coarse-to-fine matching strategy in Serradell et al. (2012). We first find a coarse set of correspondences between branch points and tips using the method below. A fine-grained correspondence is then found between the densely spaced remaining points, conditioned on the coarse correspondences.

Given a (possibly empty) set of N correspondences, we can derive a posterior distribution over possible locations for points XA in view B. For a given correspondence of size N, let XBN be the points in graph B corresponding to points in A with indices JN={j1,,jN}. Let J={1,,M} be the set of all vertex indices in graph A. The marginal posterior for point xAi is

μN(i)=KTC1N(XBNμE)σ2N(i)=K(i,i)+σ2nI2KTC1NK, whereC1N=(K(JN,JN)+σ2nIK(J,JN)K(JN,J)K(J,J)+ΣE)1K=K(JN,i)

However, because Σ1E is rank-deficient, we must rewrite C1N as

C1N=(I00S)(K(JN,JN)+σ2nISK(J,JN)K(JN,J)SSK(J,J)S+I)1(I00S)

Note that despite Σ1E being rank-deficient, the expression above is non-singular.

Using the expression above, we can compute the z-score of each point in B against the image of each point in A. Starting with an empty correspondence set, we construct a set of correspondence candidates whose z-score is below a threshold. We use z-score as weights to sample a correspondence from that set, and use that correspondence to update the z-scores of all remaining candidates by conditioning on the new correspondence set. We repeat until no valid candidates exist. Unclaimed vertices are treated as outliers. We repeat this process several times and keep the best fit. To avoid spending time on bad matches, we quit early if the relative geodesic distances between corresponding graphs differ too greatly.

Fine-grained correspondences between remaining points can be found using the Hungarian algorithm using pairwise z-score as a cost metric.

3D reconstruction

The simplest approach to 3D reconstruction is to triangulate the corresponding points in two views. Multiple views can be chained together from pairwise correspondence using transitivity. A better approach would be to use my BGP model to find an optimal registration between sequences of recovered 3D trees. I don't know if I could fit both the above and that into a single conference paper, because the temporal BGP model is a bit complicated. I have a simpler model in mind that might work better, and might be easier to squeeze into the end of a paper.

Results

None yet, working on it.

Still unclear how best to do evaluation. The original paper evaluated against synthetic data generated by VacuSynth, which we might be able to use too. They used mean squared error between estimated and true position, which I'm not thrilled with, because it doesn't explicitly capture whether the topology is matched correctly.

If Dr. Tabb's software works reasonably well, we could measure reconstruction quality and compare our reconstruction against hers. Ground truth would be the triangulated 2D ground truth, smoothed with hand-tuned BGP.

We could develop

Precision/recall against ground-truth curves.

Posted by Kyle Simek