[Work Log] FIRE - Sampling strategy

May 27, 2014

Project	FIRE
Subproject	Piecewise Linear Clustering
Working path	projects/fire/trunk/src/piecewise_linear
SVN Revision	unknown (see text)

Unless otherwise noted, all filesystem paths are relative to the "Working path" named above.

Tuning

On the issue of HMC step size, I wondered if anyone had published on the relationship between ideal step size and the log-posterior's second derivative. I found an answer in section 4.2 of Neal 2011[pdf], which poses the question in the context of a Gaussian-shaped posterior. Using eigenvalue analysis, he shows that a step size larger than 2 standard deviations results in unstable dynamcis, and the state will diverge to infinity. From Neal (2011):

For low-dimensional problems, using a value for ε that is just a bit below the stability limit is suﬃcient to produce a good acceptance rate. For high-dimensional problems, however, the stepsize may need to be reduced further than this to keep the error in H to a level that produces a good acceptance probability.

If we can estimate the Hessian of the log-posterior (perhaps diagonally), we can use this to choose the step-size as some fraction of that (user settable). Thus, our tuning run will perform Laplace approxization:

perform local minimizations of negative log posterior.
estimate diagonal Hessian at miminimum.

Sampling the clustering model

I've devised the strategy below for sampling the clustering model. I have determined that all variables can be Gibbs sampled except the latent piecewise linear model.

Gibbs sample new membership values
update each cluster's piecewise linear model using HMC
Gibbs sample observation model using known latent values

Step 3 needs some explanation.

Bayesian multiple linear regression

Let $x$ be the column vector of latent immune activity values at each time, (t). These are provided by the (fixed) piecewise linear model. Recall that the observation model is:

$y = A x + B + \epsilon$
where

$\epsilon$ is a normally distributed random variable with variance

$\sigma^2$ .

Thus, the error is given by

$e = \left ( (A B)\begin{array}{c} x^\top \\ 1 \end{array} - y^\top \right ) / \sigma$
Let

$\beta = (vec(A)^\top vec(B)^\top)^\top$ be the concatenated vectorization of A and B, and

$X = (x 1)^\top$ . Following the [derivation provided by wikipedia](http://en.wikipedia.org/wiki/Bayesian_multivariate_linear_regression), the resulting log-likelihood can be written as a guassian w.r.t.

$\beta$ :

$-(\beta - \hat \beta)^\top (\Sigma_\epsilon^{-1} \ocross X X^\top) (\beta - \hat \beta)$

where $\ocross$ is the kronecker product in our case, the observation noise variance $\Sigma_\epsilon$ is simply $I \sigma² $.

If we assume a uniform prior over $A$ and $B$ , then this Gaussian becomes our conditional posterior distribution, which we can easilly sample from with Gibbs.

Posted by Kyle Simek

Thoughts on a New approach → ← FIRE - improving fitting

KLS Research Blog ← Nothing to see here...

[Work Log] FIRE - Sampling strategy

Tuning

Sampling the clustering model

Bayesian multiple linear regression