[Work Log] FIRE - first clustering test

May 15, 2014

Project	FIRE
Subproject	Piecewise Linear Clustering
Working path	projects/fire/trunk/src/piecewise_linear
SVN Revision	16806

Unless otherwise noted, all filesystem paths are relative to the "Working path" named above.

Run #1:

Description: Run initial model estimation on real data for the first time.

Results:

Single cluster:

0.00000000e+00  0.00000000e+00
8.73210306e-03  6.75372609e-02  3.38471516e-02
1.88790755e+00 -1.88387427e+01 -9.88494611e+00
-8.09450747e-01 -2.56388154e-01 -3.61218358e-01 -4.23137123e-02  1.64628674e-02 -1.02962552e-02 -3.82633520e-01
-5.26954930e+01 -9.01436620e+00 -2.14090141e+01  1.88492958e+00  6.61450704e+00  8.49971831e+00 -2.38576056e+01
24.6497

Multiple cluster:

num_clusters:3
log weights: 0 0 0
cluster #1
  0.00000000e+00  0.00000000e+00
 -2.68876589e-01 -7.04390256e-02  1.45450526e-02
  6.16258430e+03  6.13067965e+03  6.12930057e+03
 -1.08771413e-02  5.76005556e-04 -1.04761899e-02 -1.69093096e-03  9.36991863e-04 -3.78348959e-04  1.27422332e-03
 -4.89152880e+01 -9.21454922e+00 -1.77681544e+01  2.47259001e+00  6.28886806e+00  8.63120843e+00 -2.43004449e+01
26.2893
cluster #2
  0.00000000e+00  0.00000000e+00
  1.20215477e-01  0.00000000e+00  2.24383835e-01
  6.11014579e+03  0.00000000e+00  5.96700128e+03
 -1.08771413e-02  5.76005556e-04 -1.04761899e-02 -1.69093096e-03  9.36991863e-04 -3.78348959e-04  1.27422332e-03
 -4.89152880e+01 -9.21454922e+00 -1.77681544e+01  2.47259001e+00  6.28886806e+00  8.63120843e+00 -2.43004449e+01
26.2893
cluster #3
  0.00000000e+00  0.00000000e+00
 -1.52115349e-01  2.77516102e+00  1.82492585e+00
 -3.84569849e+00 -7.18131761e+02 -1.63056790e+02
 -1.08771413e-02  5.76005556e-04 -1.04761899e-02 -1.69093096e-03  9.36991863e-04 -3.78348959e-04  1.27422332e-03
 -4.89152880e+01 -9.21454922e+00 -1.77681544e+01  2.47259001e+00  6.28886806e+00  8.63120843e+00 -2.43004449e+01
26.2893

Discussion

Surprisingly high epsilon (~26). This is far beyond the dynamic range of the data, suggesting either (a) a bug, (b) a terrible model, or (c) failure of the analytical estimation method to find a good result. Option (a) seems more likely, since a flat line give a lower error variance than this. Perhaps our observation basis A was poorly estimated. Lets re-run with PCA method.

Run #2: PCA method

Description: Re-run but using PCA instead of regression to estimate observation transformation, A. (i.e. change constant use_regression_method to false).

Results:

Single cluster:

  0.00000000e+00  0.00000000e+00
  8.73210306e-03  6.75372609e-02  3.38471516e-02
  1.88790755e+00 -1.88387427e+01 -9.88494611e+00
 -8.09450747e-01 -2.56388154e-01 -3.61218358e-01 -4.23137123e-02  1.64628674e-02 -1.02962552e-02 -3.82633520e-01
 -5.26954930e+01 -9.01436620e+00 -2.14090141e+01  1.88492958e+00  6.61450704e+00  8.49971831e+00 -2.38576056e+01
24.6497

Multiple Cluster

num_clusters:3
log weights: 0 0 0
cluster #1
  0.00000000e+00  0.00000000e+00
  1.13041314e-02  6.75372609e-02  3.41203573e-02
  3.56421036e-01 -1.88387427e+01 -1.11922319e+01
 -8.09450747e-01 -2.56388154e-01 -3.61218358e-01 -4.23137123e-02  1.64628674e-02 -1.02962552e-02 -3.82633520e-01
 -5.26954930e+01 -9.01436620e+00 -2.14090141e+01  1.88492958e+00  6.61450704e+00  8.49971831e+00 -2.38576056e+01
24.6497
cluster #2
  0.00000000e+00  0.00000000e+00
  1.05708618e-03  0.00000000e+00  0.00000000e+00
  5.35534897e+01  0.00000000e+00  0.00000000e+00
 -8.09450747e-01 -2.56388154e-01 -3.61218358e-01 -4.23137123e-02  1.64628674e-02 -1.02962552e-02 -3.82633520e-01
 -5.26954930e+01 -9.01436620e+00 -2.14090141e+01  1.88492958e+00  6.61450704e+00  8.49971831e+00 -2.38576056e+01
24.6497
cluster #3
  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  3.06188063e-02
  1.18920520e+02  0.00000000e+00  7.71079754e+01
 -8.09450747e-01 -2.56388154e-01 -3.61218358e-01 -4.23137123e-02  1.64628674e-02 -1.02962552e-02 -3.82633520e-01
 -5.26954930e+01 -9.01436620e+00 -2.14090141e+01  1.88492958e+00  6.61450704e+00  8.49971831e+00 -2.38576056e+01
24.6497

Discussion

No noticable improvement. During K-means, cluster collapse was frequent, which didn't occur in previous run.