An integrated system for data homogenization and prediction of protein-protein binding affinity is illustrated. The system accepts sequences of two proteins (top) that are fed to a learned energy model to quantitatively predict their disassociation rate. To train the model, multiple data types with varying degrees of precision, directness, and physical characterization are used (bottom). Depending on the data type, a different data homogenizer (D.H.) is used to bring all data modalities into congruence. For quantitative data, conventional double-sided loss functions are used to train the model whenever its predictions deviate from the ground truth. For binary data, one-sided and potentially learnable loss functions are used (see main text) to only penalize predictions that are clearly in conflict with the ground truth. The entire model, including the parameters of the energy model and the data homogenizers, is trained jointly using an inner loop for the energy model and an outer loop for the data homogenizers to ensure correct training behavior. A key assumption of the model is that the number of distinct experimental conditions and assays is substantially smaller than the number of distinct data points (right). Otherwise, the model is non-identifiable. Throughout the illustration green indicates raw data, blue indicates terms coming from principles-based modeling, and pink indicates learnable quantities.