Found a memory leak in kjb_write_image.
...
convert_image_file_from_raster -> kjb_system -> create_system_command_process -> kjb_fork
Modified inprogres/process_treatment_dates.m
to log-transforms and normalize data. Re-ran clustering. Some observations:
First, noise standard deviation dropped from 25 to 17. The change is surprisingly subtle, considering we significantly reduced the dynamic range of the data. I should probably investigate this more...
Second, observation offset was on the order of 1e5. Should be near zero. Definitely investigate this more.
Centering did seem to fix cluster collapse, and the memberships were a bit more evenly distributed:
Run 1: force offset to zero. does noise variance decrease?
Result: RMS error increased from ~17 to ~19.
looks like a bug in preprocessing
Found bugs:
preprocessed data looks much better now.
New bug: not enabling missing data. Fixing and rerunning...
...
RMS error is down to 0.4 for global regression model. Results for line-fitting in three regions is shown below (blue=before, green=during, red=after)
This seems in the ballpark. Full results below
0.00000000e+00 0.00000000e+00
7.22085911e-01 2.02331116e+00 1.04881185e+00
7.22916954e+01 -2.79730868e+02 1.41688548e+02
-2.56235739e-04 4.05002380e-04 1.79111799e-03 6.25535845e-04 -6.55416611e-04 8.34133235e-05 1.67634110e-03
2.49784115e-01 6.06155787e-01 -1.52977199e+00 1.15511031e+00 6.25982470e-01 4.20970189e-01 -5.78777462e-01
0.473467
Note that observation offset (5th line) is nonzero, even though we centered it. That's because this is the offset for the regression model, which is able to reveal more structure by taking time into account. This explains the lower-than-one standard deviation (6th line). The observation scaling (4th line) is interesting -- IFN-g and IL-8 are negataed, and TNF-a is tiny. Largest apparent activity in IL-1B and IL-2. I'm guessing this could change significantly once we start sampling the observation parameters.
Clustering results
K-means converges after 6 iterations.
Cluster memberships:
Only two strong clusters, despite using a three-cluster model.
Full results:
Observation (global):
A: -2.56235739e-04 4.05002380e-04 1.79111799e-03 6.25535845e-04 -6.55416611e-04 8.34133235e-05 1.67634110e-03
B: 2.49784115e-01 6.06155787e-01 -1.52977199e+00 1.15511031e+00 6.25982470e-01 4.20970189e-01 -5.78777462e-01
eps: 0.473467
cluster #1
m: 7.72956909e-01 2.23224762e+00 8.57568833e-01
b: 4.86565183e+01 -3.35063295e+02 3.15151426e+02
cluster #2
m: 9.18374118e+00 0.00000000e+00 0.00000000e+00
b: -5.45940758e+02 0.00000000e+00 0.00000000e+00
cluster #3
m: -3.69070988e+00 -3.02755277e+00 1.64214653e+00
b: 2.82195009e+02 3.96896877e+02 -2.56559679e+02
Discussion
Cluster 2 (the trivial cluster) seems to only have ata for the first region, which explains why it's ideosyncratic.
Clusters 1 and 3 differ dramatically in slope; cluster three drops significantly in the first two regions.
Do these clusters align closely with treatment type? (do tomorrow)