Skip to main content
. 2020 Apr 28;1(2):100019. doi: 10.1016/j.patter.2020.100019

Table 1.

Description of Data Type, Classification Task, Training-Set Sizes, and Neural Network Architectures for Each Application Studied

CXR EXR HCT EEG
Data type single 2D radiograph multiple 2D radiograph views 3D CT reconstruction 19-channel EEG time series
Classification task normal normal hemorrhage seizure onset
abnormal abnormal no hemorrhage no seizure onset
Anatomy chest knee head head
Train set size (Large/Medium) 50,000 30,000 4,000 30,000
5,000 3,000 400 3,000
Train Set Size (Literature) 20,0007 40,56132 9046 23,21833
Network architecture 2D ResNet-189 patient-averaged 2D ResNet-509 3D MIL + ResNet-18 + Attention35 1D Inception DenseNet36

We apply cross-modal data programming to four different data types: 2D single chest radiographs (CXR), 2D extremity radiograph series (EXR), 3D reconstructions of computed tomography of the head (HCT), and 19-channel electroencephalography (EEG) time series. We use two different dataset sizes in this work: the full labeled dataset (large) of a size that might be available for an institutional study (i.e., physician-years of hand labeling) and a 10% subsample of the entire dataset (medium) of a size that might be reasonably achievable by a single research group (i.e., physician-months of hand labeling). For context, we present the size of comparable datasets used to train high-performance models in the literature. Finally, we list the different standard model architectures used. While each image model uses a residual network encoder,9 architectures vary from a simple single-image network (CXR) to a mean across multiple image views (EXR) to a dynamically weighted attention mechanism that combines image encodings for each axial slice of a volumetric image (HCT). For EEG time series, an architecture combining the best attributes of the Residual and Densely Connected37 networks for 1D applications is used, in which each channel is encoded separately and a fully connected layer is used to combine features extracted from each (see Experimental Procedures).