Input: training data
for regression and
for classification
-
output: an ensemble tree model
Draw bootstrap sample ℒb from the training data, with replacement and equal probability on each subject. Denote the out-of-bag data as ℒo.
-
At an internal node T, stop if the sample size is sufficiently small. Otherwise, randomly generate candidate splitting variables and cutting point. For each candidate split, denote TL and TR as the two daughter notes resulting from the candidate split. Calculate the score:
In the above definitions, wT, wTL and wTR are the sum of subject weights within the corresponding node.
is the weighted variance, with
.
is the weighted gini impurity, with
for class k = 1, …, K. The other quantities are defined accordingly.
Select the candidate split with the highest score, and apply b) and c) to each of the resulting daughter nodes.
Repeat a)–c) until the desired number of trees are fited.
|