1 input_data_folder |
Path to folder in which input data can be found |
/input_data |
2 input_data_files |
List of prefixes of data files |
['input_data1’, 'input_data2’] |
3 gold_standard_file |
File name of gold_standard_file, must be in input_data_folder |
{'input_data': 'gold_standard_file.txt'} |
4 read_csv_kwargs |
pandas.read_csv keyword arguments for input data |
{'test_input': {'index_col':[0]}} |
5 output_folder |
Path to folder into which results should be written |
/results |
6 intermediates_folder |
Name of subfolder to put intermediate results |
clustering_intermediates |
7 clustering_results |
Name of subfolder to put aggregated results |
clustering |
8 clusterer_kwargs |
Additional arguments to pass to clusterers |
KMeans: {'random_state':8}} |
9 generate_parameters_addtl_kwargs |
Additonal keyword arguments for the hypercluster.AutoClusterer class |
{‘KMeans’: {'random_search': true) |
10 evaluations |
Names of evaluation metrics to use |
['silhouette_score', 'number_clustered'] |
11 eval_kwargs |
Additional kwargs per evaluation metric function |
{'silhouette_score': {'random_state': 8}} |
12 metric_to_choose_best |
Which metric to maximize to choose the labels |
silhouette_score |
13 metric_to_compare_labels |
Which metric to use to compare label results to each other |
adjusted_rand_score |
14 compare_samples |
Whether to made a table and figure with counts of how often each two samples are in the same cluster |
"true" |
15 output_kwargs |
pandas.to_csv and pandas.read_csv keyword arguments for output tables |
{'evaluations': {'index_col':[0]}, 'labels': {'index_col':[0]}} |
16 heatmap_kwargs |
Arguments for seaborn.heatmap for pairwise visualizations |
{'vmin':-2, 'vmax':2} |
17 optimization_parameters |
Which algorithms and corresponding hyperparameters to try |
{'KMeans': {'n_clusters': [5, 6, 7] }} |