. 2020 Apr 28;1(2):100019. doi: 10.1016/j.patter.2020.100019

Table 1.

Description of Data Type, Classification Task, Training-Set Sizes, and Neural Network Architectures for Each Application Studied

	CXR	EXR	HCT	EEG
Data type	single 2D radiograph	multiple 2D radiograph views	3D CT reconstruction	19-channel EEG time series
Classification task	normal	normal	hemorrhage	seizure onset
Classification task	abnormal	abnormal	no hemorrhage	no seizure onset
Anatomy	chest	knee	head	head
Train set size (Large/Medium)	50,000	30,000	4,000	30,000
Train set size (Large/Medium)	5,000	3,000	400	3,000
Train Set Size (Literature)	20,000⁷	40,561³²	904⁶	23,218³³
Network architecture	2D ResNet-18⁹	patient-averaged 2D ResNet-50⁹	3D MIL + ResNet-18 + Attention³⁵	1D Inception DenseNet³⁶

We apply cross-modal data programming to four different data types: 2D single chest radiographs (CXR), 2D extremity radiograph series (EXR), 3D reconstructions of computed tomography of the head (HCT), and 19-channel electroencephalography (EEG) time series. We use two different dataset sizes in this work: the full labeled dataset (large) of a size that might be available for an institutional study (i.e., physician-years of hand labeling) and a 10% subsample of the entire dataset (medium) of a size that might be reasonably achievable by a single research group (i.e., physician-months of hand labeling). For context, we present the size of comparable datasets used to train high-performance models in the literature. Finally, we list the different standard model architectures used. While each image model uses a residual network encoder,⁹ architectures vary from a simple single-image network (CXR) to a mean across multiple image views (EXR) to a dynamically weighted attention mechanism that combines image encodings for each axial slice of a volumetric image (HCT). For EEG time series, an architecture combining the best attributes of the Residual and Densely Connected³⁷ networks for 1D applications is used, in which each channel is encoded separately and a fully connected layer is used to combine features extracted from each (see Experimental Procedures).