[Reference] Marginal likelihood gradient (part 2)

November 25, 2013

Today, I'll cover some additional issues not covered in the earlier reference post on this topic. First, the original gradient derivation was missing a term corresponding to the normalization constant (which isn't constant as a function of the index set). Second, the previous write-up assumed 1-dimensional data; today we'll talk about generalizing the formulae to three dimensions.

Normalization constant

Recall the gaussian equation, as a function of indices:

k|Σ(x)|12exp{12(yμ)Σ1(x)(yμ)}

Taking the log gives

log(k)12log(|Σ(x)|)+(12(yμ)Σ1(x)(yμ))
When taking the derivative, the first term vanishes, and the third term was handled [in the last writeup](/ksimek/research/2013/11/10/reference/) as g. We need to find the derivative of the second term. Let Z(x)=12log(|Σ(x)|). Also, let C(i)=SδiSi, so U(i)=C(i)+C(i)

According to equation (38) of The Matrix Cookbook, the derivative of the log determinant is given by:

Z(x)xi=12Tr[U1U]=12Tr[U1(C+C)]=12Tr[U1C]+Tr[U1C]=12Tr[U1SδiSi]+Tr[U1SiδiS]=12Tr[SiU1Sδi]+Tr[δiSU1Si]=122Tr[SiU1Sδi]=122Tr[SiU1Sδi]=SiU1Sδi

Since this inner product gives us a single element of the gradient, we can get the entire gradient using matrix multiplication.

Z(x)=diag(SU1SΔ=i(SU1SΔ)(i:)

Note that we conly care about the diagonal elements of the matrix product. The second expression avoids computing the off-diagonal elements by taking only the dot product of matrix rows/columns that result in the diagonal elmeents. To do this, we use the Hadamard product, , and then sum over rows.

Generalizing to three dimensions

We replace several matrices with their 3D version.

The 3D version of δi is:

δ(3)i=P(δi000δi000δi)

Here, P is a permutation matrix such that PM converts the rows of M from (x1,x2,...,y1,y2,...,z1,z2,...) to (x1,y1,z1,x2,y2,z2,...).

Similarly, the 3D version of Δ is

Δ(3)=P(Δ000Δ000Δ)P
The vector Si becomes a three-column matrix, [SxiSyiSzi], corresponding to the noise-covariance of the i-th 3D point.

The expression for Z(x)xi is no longer a dot product, but the trace of a 3x3 matrix. In practice, this is easy to implement, by replacing all matrices in eq (1) with their 3D equivalent, and then suming each (xyz) block in the resulting vector. In matlab, we can do this cheaply by reshaping the vector into a 3x(N/3) matrix and summing over rows. If the old expression was

grad_Z = sum(S .* inv(U) S Delta')

the new expresssion becomes

grad_Z_3d = sum(reshape(sum(S .* inv(U) S Delta_3d'), 3, []));

Applying to g

We can apply a similar transformation to the other term of the gradient (which we called g in this post. Recall the old expression for a single elements of the graident was

gxi=12zK(i)z

Recall that K in 1D is sparse, having the form

K=(00δi0)+(00δi0)
Generalizing to the 3D equivalent, K(3), the equation becomes:
K(3)=P(00δ(3)i0)+(00δ(3)i0)P

In other words, the δi in K is replaced with a permuted block-diagonal matrix of three δi's. The dot product in equation (2) then becomes the sum of the three individual dot products for x, y, and z coordinates.

We can use this observation to apply this generalization to the full gradient equation. Recall the 1D equation for the full gradient,

g=z(Δz)
Like in the case of a single element of the gradient, we can generalize to 3D by simply taking the sum of the result for each of the x, y, and z dimensions. We can accomplish this in a vectorized way by replacing Δ with it's 3D equivalent Δ(3), and then sum each block of (xyz) coordinates in the resulting vector, like we did for Z. (Note that here, we assume z is computed using the 3D prior covariance, K(3), and needs no explicit lifting to 3D). In matlab, this looks like
grad_g = sum(reshape(z .* (Delta_3d' * z), 3, []))

Mathematically, we can represent this summation by right-multiplying by permuted stack of identity matriceces.

g(3)=[z(Δ(3)z)]P(III)
Posted by Kyle Simek