[Work Log] FIRE date analysis

April 10, 2014
  1. awk script to replace blanks with -99
  2. awk script to replace dates with epoch numbers
  3. decide which columns to use and how to pool
  4. clustering

Something is wrong with the scripts that generate the database -- some values are missing from the goldStandardDateCRF key field. It's because some records exist solely to hold immunity data, which doesn't have an official gold standard date column. Can probably ignore for now.

Clustering

For now, use pooled scores for each of 5 databses:

sri3.csv sritot
das4.csv dastot
ea4.csv  easumtot
fact4.csv facttot
pss4.csv psstot

The datasets were originally named here.

Read data, isolated columns of interest, and concatenated each subjects' visits into long rows, then write to csv file for processing:

fire_data_path = '../../data/'
[db, db_meta] = fire_build_self_report_database(fire_data_path);
[cluster_db, cluster_db_meta] = fire_gather_columns('../../experiments/2014_04_10_cluster_3/columns.txt', db_meta, db);
wide_db = e_2014_03_merge_visits(cluster_db)
fire_write_csv(wide_db(:,2:end), [], 'test.csv')

EXPERIMENT PATH: experiments/2014_04_10_cluster_3

README file in experiment path describe how to prepare data and run experiment.

Result: 3-cluster model, with apparently very good separation. reulsts are in $EXPERIMENT_PATH/out/cluster_results.txt

Posted by Kyle Simek
blog comments powered by Disqus