[Work Log] FIRE - initial analysis

March 05, 2014

Merging datasets into database

To perform interesting queries across datasets, we'll need to merge them into a consistent format. The "structure array" dataset doesn't seem to be suited to complex queries, so we'll use a cell array with rows corresponding to records and columns corresponding to fields. We will merge records from different datasets using (Subject_ID, Visit) as a unique key. We'll also merge the metadata structures of all datasets.

New files:

data/fire_read_all.m - calls fire_read_csv on all .csv files in path
data/merge_datasets.m converts output of fire_read_all.m into a NxM cell array and a metadata structure describing each columns.
data/fire_classify_data_columns.m - infers the subtype of each numeric fields (bool, int, float). Also computes coverage. Stores in column metadata.

Can now do rudimentary table queries using matlab operators. For example, the code below loads the database and queries, "how many bool fields have coverage above 95%?"

cd /Users/ksimek/work/src/fire/src
% read data
[data, meta] = fire_read_all('../data');
% merge into single "database" table
[db, db_meta] = merge_datasets(data, meta);
% idenitfy column types and measure coverage
db_meta_2 = fire_classify_data_columns(db, db_meta);
% perform query: number of bool fields with 95% or greater coverage
I = ([db_meta_2.coverage] > 95); 
sum(cellfun(@(x) strcmp(x,'bool'), {db_meta_2(I).subtype}))

Miscellaneous thoughts

If doing dynamical analysis, consider re-aligning time, using one of the following as origin:

first treatment
first chemo treament
first rad treament
inflamation threshold

Posted by Kyle Simek

FIRE discussion → ← FIRE - background reading, thinking and planning

KLS Research Blog ← Nothing to see here...

[Work Log] FIRE - initial analysis

Merging datasets into database

Miscellaneous thoughts