Today, I'll cover some additional issues not covered in the earlier reference post on this topic. First, the original gradient derivation was missing a term corresponding to the normalization constant (which isn't constant as a function of the index set). Second, the previous write-up assumed 1-dimensional data; today we'll talk about generalizing the formulae to three dimensions.
Recall the gaussian equation, as a function of indices:
Taking the log gives
According to equation (38) of The Matrix Cookbook, the derivative of the log determinant is given by:
Since this inner product gives us a single element of the gradient, we can get the entire gradient using matrix multiplication.
Note that we conly care about the diagonal elements of the matrix product. The second expression avoids computing the off-diagonal elements by taking only the dot product of matrix rows/columns that result in the diagonal elmeents. To do this, we use the Hadamard product, ⊙, and then sum over rows.
We replace several matrices with their 3D version.
The 3D version of δi is:
Here, P is a permutation matrix such that PM converts the rows of M from (x1,x2,...,y1,y2,...,z1,z2,...) to (x1,y1,z1,x2,y2,z2,...).
Similarly, the 3D version of Δ is
The expression for ∂Z(x)∂xi is no longer a dot product, but the trace of a 3x3 matrix. In practice, this is easy to implement, by replacing all matrices in eq (1) with their 3D equivalent, and then suming each (xyz) block in the resulting vector. In matlab, we can do this cheaply by reshaping the vector into a 3x(N/3) matrix and summing over rows. If the old expression was
grad_Z = sum(S .* inv(U) S Delta')
the new expresssion becomes
grad_Z_3d = sum(reshape(sum(S .* inv(U) S Delta_3d'), 3, []));
We can apply a similar transformation to the other term of the gradient (which we called ∇g in this post. Recall the old expression for a single elements of the graident was
Recall that K′ in 1D is sparse, having the form
In other words, the δi in K′ is replaced with a permuted block-diagonal matrix of three δi's. The dot product in equation (2) then becomes the sum of the three individual dot products for x, y, and z coordinates.
We can use this observation to apply this generalization to the full gradient equation. Recall the 1D equation for the full gradient,
grad_g = sum(reshape(z .* (Delta_3d' * z), 3, []))
Mathematically, we can represent this summation by right-multiplying by permuted stack of identity matriceces.