To perform interesting queries across datasets, we'll need to merge them into a consistent format. The "structure array" dataset doesn't seem to be suited to complex queries, so we'll use a cell array with rows corresponding to records and columns corresponding to fields. We will merge records from different datasets using (Subject_ID, Visit) as a unique key. We'll also merge the metadata structures of all datasets.
New files:
data/fire_read_all.m
- calls fire_read_csv
on all .csv files in pathdata/merge_datasets.m
converts output of fire_read_all.m
into a NxM cell array and a metadata structure describing each columns.data/fire_classify_data_columns.m
- infers the subtype of each numeric fields (bool, int, float). Also computes coverage. Stores in column metadata.Can now do rudimentary table queries using matlab operators. For example, the code below loads the database and queries, "how many bool fields have coverage above 95%?"
cd /Users/ksimek/work/src/fire/src
% read data
[data, meta] = fire_read_all('../data');
% merge into single "database" table
[db, db_meta] = merge_datasets(data, meta);
% idenitfy column types and measure coverage
db_meta_2 = fire_classify_data_columns(db, db_meta);
% perform query: number of bool fields with 95% or greater coverage
I = ([db_meta_2.coverage] > 95);
sum(cellfun(@(x) strcmp(x,'bool'), {db_meta_2(I).subtype}))
If doing dynamical analysis, consider re-aligning time, using one of the following as origin: