[Work Log] FIRE - improving fitting

May 26, 2014
Project FIRE
Subproject Piecewise Linear Clustering
Working path projects/​fire/​trunk/​src/​piecewise_linear
SVN Revision unknown (see text)
Unless otherwise noted, all filesystem paths are relative to the "Working path" named above.

Run: Refactoring kmeans, fixed bug

Description: Refactored k-means to make replicates easier to run. Also fixed a bug in how collapsed clusters are handled. Results:

BEFORE REPOPULATION FIX

    10 itns
    ----------
    training error: 3.61348
    test error: 4.60147

    15 itns
    --------
    training error: 3.60094
    test error: 4.58815

    20 itns
    ----------
    training error: 3.60094
    test error: 4.58815

AFTER REPOPULATION FIX

    10 itns
    ---------
    training error: 3.5833
    test error: 4.57327

    15 itns
    ---------
    training error: 3.58389
    test error: 4.57315

    20 itns
    ---------
    training error: 3.58411
    test error: 4.57321

Discussion:

Neither of these results match what I was getting on Friday. Testing error in particular is worse. Did I break something?

Found it: error in evaluation code arising from bad copy/paste.

Another issue: should be using centered_data.txt, not data.txt.

Run: multiple repetitions

Description: Run kmeans with 10 repetitions.
Issues:

Results:

trivial model error: 3.55057

10 itns
-------
training error: 3.54682
test error: 4.51326

20 itns
------------
training error: 3.53998
test error: 4.52958

Training and teesting error improve over the single-initialization version. Test error is slightly worse for 20-iteration run; possibly due to overfitting.

Run: New baseline - use centered data

Description: Perform 10 replicates of k-means using centered log-transformed data.

Trivial model error: 1.40502

10 itns
-----------
training error: 1.10196
test error: 1.11679

15 itns
----------
training error: 1.09979
test error: 1.09903

20 itns
--------------
training error: 1.09756
test error: 1.09137

Run: compare against null model

Description: Do we do better or worse with a constant model? (slope zero, intercept zero) Results: see previous runs; "trivial model" results have been added.

Discussion:
It is interesting that the trivial model performs better on raw data than the cluster model. With rescaled data, the cluster model performs better.

Run: continuous model (aborted)

Description: Re-run using the continuous model. Details:

Will need to re-run all experiments.

Run: baseline (rerun)

Description: Re-run baseline fitting of centered data using discontinuous model. 10 Repetitions

Results:

Trivial model error: 1.58913
Single cluster error: 1.58365
training error: 1.61625
test error: 1.64005

Discussion: trivial model outperforms clustered model. This shouldn't be happening, need to investigate.

Posted by Kyle Simek
blog comments powered by Disqus