[Work Log] FIRE Debugging cluster model

May 09, 2014
Project FIRE
Subproject Piecewise Linear Clustering (tests)
Working path projects/​fire/​trunk/​src/​piecewise_linear/​test
SVN Revision unknown (see text)
Unless otherwise noted, all filesystem paths are relative to the "Working path" named above.

Initial cluster estimation, something like k-means.

likelihood isn't monotonically increasing over time.

  1. we aren't updating cluster weights
  2. we're doing soft assignment

--

found bug:

Run #1: initialize by K-means (part 1)

Description: do k-means to initialize cluster assignments.

Details:

Estimate A, B: no
estimate epsilon: no
shared observation model: yes
continuity constraints: yes
num clusters: 3
observed dimensions: 7
num observations: 150
A estimation method: regression

Revision: 16767

Method: Build first cluster from all data. Iteratively choose "bad" points under current model as prototypes for new cluster. Then run k-means.

Result: Converges after 15 iterations.

In the 2x150 image below, first row is ground truth clustering, second row is experimental results. Specific coloring is irrelevant, grouping is.

Discussion:

Not great results. Hopefully noise is just too high to find any useful structure without more observations.

Interesting that the g.t. clustering is so uneven. May want to increase Dirichlet distribtuion's alpha parameter.

Run 2: Rerun with small noise

Description: Re-run with noise lowered to 0.01. Revision: 16773 Results: Converges after 3 iterations.

Discussion:

Much better! We don't correctly find the tiny middle cluster, which we can hopefully find during HMC.

TODO: Return to this model when testing HMC

Aside: speeding up Matrix::resize(); adding Matrix::realloc()

The documentation for Matrix::resize claims it reuses allocated space, which is actually untrue. Spent some time playing with reimplementations before realizing any good solution will require reworking the C library, which would require more precision and care than I have the time for at the moment.

Instead, made two halfway measures:

  1. added spacial case; space is reused iff new_cols == num_cols && new_rows <= num_rows.
  2. Added new function: Matrix::realloc, which is like resize, but always preserves allocated storage if possible, and doesn't guarantee data is preserved afterward.

Run 3: Rerun with unknown observation model

Results: same

Posted by Kyle Simek
blog comments powered by Disqus