Differences in performance between the combined models (lines ending in “−1” in Figure 1) compared with the physiology-only models (lines ending in “−3” in Figure 1) and models using the physiologic component of the combined model but discarding the indicators component at evaluation (lines ending in “−2” in Figure 1). The difference in performance between the combined model and physiology-only models is shown in dashed lines, and the difference in performance between the full combined model and just using the combined model’s physiologic components are shown in solid lines. Blue lines denote models fit to Methodist floor data, green lines denote models fit to Methodist ICU data, and red lines denote models fit to Beth Israel ICU data. The top row shows differences in areas under the receiver operating characteristic curves (AUROCs, also known as C-statistics), and the bottom row shows differences areas under the precision-recall (AUPR) curves as metrics assessing overall discrimination. Values above 0 indicate that the model under evaluation performed better than the combined model’s performance; values less than 0 indicate that the combined model performed better. Each column shows the performance of all fitted models on one cohort: Methodist floor at left, Methodist ICU at center, and Beth Israel ICU at right. Results within a column for models trained on that data source are in-sample results measuring internal validity, whereas results for models learned from other data sources are out-of-sample and measure external validity. The right column shows that both the physiology-only models and using only the physiologic components of the combined models both perform worse than the combined model when validated internally on Beth Israel data, with the physiology-only models a bit better. However, out-of-sample, this is flipped: the combined model typically fares worst and using only the physiologic component of the combined model is best, with physiology-only models faring somewhere in the middle. For models fit to Methodist data, there are less obvious differences between the physiology-only models and using just the physiologic components of the combined models, and in fact, the full combined models typically fare best.