Haplo Prediction
predict haplogroups
|
Predicts haplogroup using models trained with Y-STR data.
Licensed under Creative Commons BY-NC-SA 3.0.
Questions or comments? Contact Joseph Schlecht.
Use 'haplo-train'
to train the models.
usage: haplo-train OPTIONS [data-fname | <stdin>] -h, --help Prints program usage. -v, --version Prints program version. --header-in The input data contains a header (descriptive) line, which should be discarded. --header-out Write a header (descriptive) line to the first line of the output results. --options=ARG File containing program options. Any options appearing on the command line following this option take precendence over those in the options file. --seed=ARG Random seed. --input-format=ARG Input file format. Must be one of {txt, csv, xml}. If the input is XML, it must conform to the XML DTD haplo-input.dtd. --input-dtd=ARG If the input format is XML, validate it with this DTD. --output-format=ARG Output file format. Must be one of {txt, csv, xml}. --labels=ARG XML file containing the organization and listing of possible haplo groups labels for the samples. Must conform to the XML DTD haplo-labels.dtd. --labels-dtd=ARG Validate the XML labels file with this DTD. --id-cols=ARG Comma separated ordered list of columns to use for sample identification. Prefixes the output of each sample. Count begins with 1 at the first column of the file. Set to zero to ignore the id column --label-col=ARG Column containing the haplo group labels. Count begins with 1 at the first column of the file. Set to zero to ignore the label column. --1st-marker-col=ARG Column containing the first marker. Use in conjunction with num-markers to specify the markers for reading. All other markers are assumed to follow this one. Count begins with 1. --num-markers=ARG Number of markers to read. Use in conjunction with 1st-marker-col to specify the markers for reading. --marker-cols=ARG Comma separated ordered list of markers to use for training. Use instead of 1st-marker-col and num-markers. Count begins with 1 at the first column of the CSV file. --model-dir=ARG Directory to put trained models in. --data-out-dir=ARG Directory to put the generated training data in for each model. The name of the model is used, so if this directory is set as the same as the model-dir, the models could be overwritten. The default is not to output the data. --nb-freq=ARG Naive Bayes non-parametric frequency model tree information. --nb-freq-dtd=ARG Validate the naive Bayes non-parametric frequency model tree information XML file with this DTD. --nb-gauss=ARG Naive Bayes Gaussian model tree information. --nb-gauss-dtd=ARG Validate the naive Bayes Gaussian model tree information XML file with this DTD. --nb-gmm=ARG Naive Bayes Gaussian mixture model tree information. --nb-gmm-dtd=ARG Validate the naive Bayes Gaussian mixture model tree information XML file with this DTD. --mv-gmm=ARG Multivariate Gaussian mixture model tree information. --mv-gmm-dtd=ARG Validate the multivariate Gaussian mixture model tree information XML file with this DTD. --mv-mmm=ARG Multivariate multinomial mixture model tree information. --mv-mmm-dtd=ARG Validate the multivariate multinomial mixture model tree information XML file with this DTD. --svm=ARG SVM model tree information. --svm-dtd=ARG Validate the SVM model tree information XML file with this DTD. --weka-j48=ARG Weka J48 model tree information. --weka-part=ARG Weka PART model tree information. --weka-jar=ARG Weka java archive file. Required for using the Weka algorithms. --weka-dtd=ARG Validate the Weka model tree information XML files with this DTD. --nearest=ARG Nearest neighbor model information. --nearest-dtd=ARG Validate the nearest neighbor model information XML file with this DTD.
Use 'haplo-predict'
to predict haplogroup with the trained models.
usage: haplo-predict OPTIONS [data-fname | <stdin>] -h, --help Prints program usage. -v, --version Prints program version. --header-in The input data contains a header (descriptive) line, which should be discarded. --header-out Write a header (descriptive) line to the first line of the output results. --exclude-one When performing the tandem prediction decision, exclude at most one prediction from the set of classification algorithms. There must be three or more algorithms in play for this to take effect. --options=ARG File containing program options. Any options appearing on the command line following this option take precendence over those in the options file. --seed=ARG Random seed. --input-format=ARG Input file format. Must be one of {txt, csv, xml}. If the input is XML, it must conform to the XML DTD haplo-input.dtd. --input-dtd=ARG If the input format is XML, validate it with this DTD. --output-format=ARG Output file format. Must be one of {txt, csv, xml}. --labels=ARG XML file containing the organization and listing of possible haplo groups labels for the samples. Must conform to the XML DTD haplo-labels.dtd. --labels-dtd=ARG Validate the XML labels file with this DTD. --id-cols=ARG Comma separated ordered list of columns to use for sample identification. Prefixes the output of each sample. Count begins with 1 at the first column of the file. Set to zero to ignore the id column --label-col=ARG Column containing the haplo group labels. Count begins with 1 at the first column of the file. Set to zero to ignore the label column. --1st-marker-col=ARG Column containing the first marker. Use in conjunction with num-markers to specify the markers for reading. All other markers are assumed to follow this one. Count begins with 1. --num-markers=ARG Number of markers to read. Use in conjunction with 1st-marker-col to specify the markers for reading. --marker-cols=ARG Comma separated ordered list of markers to use for training. Use instead of 1st-marker-col and num-markers. Count begins with 1 at the first column of the CSV file. --output=ARG File to output the predictions to. The default is stdout. --model-dir=ARG Directory containing the trained models. --nb-freq=ARG Naive Bayes non-parametric frequency model tree information. --nb-freq-dtd=ARG Validate the naive Bayes non-parametric frequency model tree information XML file with this DTD. --nb-gauss=ARG Naive Bayes Gaussian model tree information. --nb-gauss-dtd=ARG Validate the naive Bayes Gaussian model tree information XML file with this DTD. --nb-gmm=ARG Naive Bayes Gaussian mixture model tree information. --nb-gmm-dtd=ARG Validate the naive Bayes Gaussian mixture model tree information XML file with this DTD. --mv-gmm=ARG Multivariate Gaussian mixture model tree information. --mv-gmm-dtd=ARG Validate the multivariate Gaussian mixture model tree information XML file with this DTD. --svm=ARG SVM model tree information. --svm-dtd=ARG Validate the SVM model tree information XML file with this DTD. --weka-j48=ARG Weka J48 model tree information. --weka-part=ARG Weka PART model tree information. --weka-jar=ARG Weka java archive file. Required for using the Weka algorithms. --weka-dtd=ARG Validate the Weka model tree information XML files with this DTD. --nearest=ARG Nearest neighbor model information. --nearest-dtd=ARG Validate the nearest neighbor model information XML file with this DTD. --nearest-max-d=ARG Maximum distance allowed for a nearest neighbor classification.