. Author manuscript; available in PMC: 2021 Aug 23.

Published in final edited form as: Nature. 2017 Jan 25;542(7639):115–118. doi: 10.1038/nature21056

Extended Data Table 1 |.

Disease-Partitioning Algorithm

Algorithm 1 Disease Partitioning Algorithm
1:	Inputs
2:	taxonomy (tree): the disease taxonomy
3:	maxClassSize (int): maximum data points in a class
4:	Output
5:	partition (list of sets): partition of the diseases into classes
6:
7:	procedure Descendants(node)
8:	return {node} ∪ {Descendants(child) for child in node.children}
9:
10:	procedure NumImages(nodes)
11:	return sum(length(node.images) for node in nodes)
12:
13:	procedure PartitionDiseases(node)
14:	class ← Descendants (node)
15:	if NumImages(class) < maxClassSize then
16:	append class to partition
17:	else
18:	for child in node.children do
19	PartitionDiseases(child)
20:
21:	partition ← [ ]
22:	PartitionDiseases (taxonomy.root)
23:	return partition

This algorithm uses the taxonomy to partition the diseases into fine-grained training classes. We find that training on these finer classes improves the classification accuracy of coarser inference classes. The algorithm begins with the top node and recursively descends the taxonomy (line 19), turning nodes into training classes if the amount of data contained in them (with the convention that nodes contain their children) does not exceed a specified threshold (line 15). During partitioning, the recursive property maintains the taxonomy structure, and consequently, the clinical similarity between different diseases grouped into the same training class. The data restriction (and the fact that training data are fairly evenly distributed amongst the leaf nodes) forces the average class size to be slightly less than maxClassSize. Together these components generate training classes that leverage the fine-grained information contained in the taxonomy structure while striking a balance between generating classes that are overly fine-grained and do not have sufficient data to be learned properly, and classes that are too coarse, too data abundant and that prevent the algorithm from properly learning less data-abundant classes. With maxClassSize = 1,000 this algorithm yields 757 training classes.