Debugging new implementation of EM GMM fitting with handling of missing data.
After first iteration, likelihood weights (in p_sum_mp) are grossly uneven. Likely issue is in the E step.
Variances in (var_mp) are suspect in the E step. So are means (in u_mp). Preceeding M step now looks suspect. How are mean and variance computed?
Was resetting accumulators for x and x2 after each point. Part of mis-guided refactoring attempt yesterday. Rebuilding and re-running; are mean, variance, and likelihood weights reasonable after first iteration?
Okay, results are identical to reference implementation when no missing data. Adding missing data to the dataset...
Results look good at 50% missing, but occasionally getting local optima (~10% of the time). In those cases, the final likleihood is lower, so no evidence of bug.
Time for clean up. A bit of clean up to the test, adding run-time flags for missing, held-out, etc.
Did signfiicant rework of the GMM EM test suite. Now can specify which of five tests to run, and non-interactive mode tests all five tests.
Posted by Kyle Simek