Algorithm 1: ReprsentConcat Algorithm |
Input: network_files: paths to adjacency list files, n: number of genes in input networks, d: number of output dimensions, onttype: which type of annotations to use, early_stopping_rounds: number of stopping the rounds Output: opt_pred_results: prediction results fori=1: length( network_files) A=load_network( network_files(i), n) Q=rwr(A, 0.5) R=ln(Q+1/n) U, ∑, V =svd(R) X=hstack(X, X_cur) end for Y=load_annotation(onttype) //load annotations //split the data into train data and test data X_train, Y_train, X_test, Y_test=train_test_split(X, Y) layer_id=0 while 1 if layer_id==0 X_cur_train=zeros(X_train) X_cur_test=zeros( X_test) else X_cur_train=X_proba_train.copy() X_cur_test= X_proba_test.copy() end if X_cur_train=hstack( X_cur_train, X_train) X_cur_ test =hstack( X_cur_ test, X_ test) for estimator in n_randomForests //train each forest through k-fold cross validation y_probas= estimator.fit_transform( X_cur_train, Y_train) y_train_proba_li+= y_probas y_test_probas= estimator.predict_proba(X_cur_ test) y_test_proba_li+= y_test_probas end for y_train_proba_li /=length(n_randomForests) y_test_proba_li /=length(n_randomForests) train_avg_F1=calc_F1(Y_train, y_train_proba_li) // calculate the F1 value test_avg_F1=calc_F1(Y_test, y_test_proba_li) test_F1_list.append( test_avg_F1) opt_layer_id=get_opt_layer_id( test_F1_list) if opt_layer_id = layer_id opt_pred_results=[ y_train_proba_li, y_test_proba_li] end if if layer_id - opt_layer_id >= early_stopping_rounds return opt_pred_results end if layer_id+=1 end while |