Today, I'll cover some additional issues not covered in the earlier reference post on this topic. First, the original gradient derivation was missing a term corresponding to the normalization constant (which isn't constant as a function of the index set). Second, the previous write-up assumed 1-dimensional data; today we'll talk about generalizing the formulae to three dimensions.
Normalization constant
Recall the gaussian equation, as a function of indices:
\[
\log(k) - \frac{1}{2} \log(|\Sigma(x)|) + \left( -\frac{1}{2} (y - \mu)^\top \Sigma^{-1}(x) (y - \mu) \right )
\]
When taking the derivative, the first term vanishes, and the third term was handled [in the last writeup](/ksimek/research/2013/11/10/reference/) as \(\nabla g\). We need to find the derivative of the second term. Let \(Z(x) = \frac{1}{2} \log(|\Sigma(x)|) \). Also, let \(C_{(i)} = S \, \delta_i \, S_i^\top \), so \(U'_{(i)} = C_{(i)} + C_{(i)}^\top\)
According to equation (38) of The Matrix Cookbook, the derivative of the log determinant is given by:
Since this inner product gives us a single element of the gradient, we can get the entire gradient using matrix multiplication.
\[
\begin{align}
\nabla Z(x) &= \text{diag}(S^\top U^{-1} S \Delta^\top \\
&= \sum_i (S \odot U^{-1} S \Delta^\top )_{(i:)} \tag{1}
\end{align}
\]
Note that we conly care about the diagonal elements of the matrix product. The second expression avoids computing the off-diagonal elements by taking only the dot product of matrix rows/columns that result in the diagonal elmeents. To do this, we use the Hadamard product, \(\odot\), and then sum over rows.
Generalizing to three dimensions
We replace several matrices with their 3D version.
Here, P is a permutation matrix such that PM converts the rows of M from \((x_1, x_2, ..., y_1, y_2, ..., z_1, z_2, ...)\) to \((x_1, y_1, z_1, x_2, y_2, z_2, ...)\).
Similarly, the 3D version of \(\Delta\) is
\[
\Delta^{(3)} = P \left ( \begin{array}{ccc}\Delta & 0 & 0 \\ 0 & \Delta & 0 \\ 0 & 0 & \Delta \end{array}\right) P^\top
\]
The vector \(S_i\) becomes a three-column matrix, \([ S_{x_i} S_{y_i} S_{z_i}]\), corresponding to the noise-covariance of the i-th 3D point.
The expression for \(\frac{\partial Z(x)}{\partial x_i}\) is no longer a dot product, but the trace of a 3x3 matrix. In practice, this is easy to implement, by replacing all matrices in eq (1) with their 3D equivalent, and then suming each (xyz) block in the resulting vector. In matlab, we can do this cheaply by reshaping the vector into a 3x(N/3) matrix and summing over rows. If the old expression was
grad_Z = sum(S .* inv(U) S Delta')
the new expresssion becomes
grad_Z_3d = sum(reshape(sum(S .* inv(U) S Delta_3d'), 3, []));
Applying to \(\nabla g\)
We can apply a similar transformation to the other term of the gradient (which we called \(\nabla g\) in this post. Recall the old expression for a single elements of the graident was
In other words, the \(\delta_i\) in \(K'\) is replaced with a permuted block-diagonal matrix of three \(\delta_i\)'s.
The dot product in equation (2) then becomes the sum of the three individual dot products for x, y, and z coordinates.
We can use this observation to apply this generalization to the full gradient equation. Recall the 1D equation for the full gradient,
\[
\begin{align}
\nabla g = z \odot (\Delta' z) \tag{4}
\end{align}
\]
Like in the case of a single element of the gradient, we can generalize to 3D by simply taking the sum of the result for each of the x, y, and z dimensions. We can accomplish this in a vectorized way by replacing \(\Delta\) with it's 3D equivalent \(\Delta^{(3)}\), and then sum each block of (xyz) coordinates in the resulting vector, like we did for \(\nabla Z\). (Note that here, we assume \(z\) is computed using the 3D prior covariance, \(K^{(3)}\), and needs no explicit lifting to 3D). In matlab, this looks like