An overview of the proposed self-supervised collaborative training framework. A raw undersampled k-space data sequence is undersampled from the fully sampled data using an undersampled mask retrospectively, and then two k-space data sequences and are augmented from . In the considered scenario, and are reundersampled from using reundersampled mask and , respectively. Next, the two networks received inputs from zero-filling image sequences of and . The predicted image sequences of networks are transformed to the k-space data and by two-dimensional Fourier transform. Afterward, a co-training loss is calculated using , and . The backbone reconstruction network can flexibly adopt different iterative un-rolled network, such as CRNN, k-t NEXT and SLR-Net. Collaborative network-1 and collaborative network-2 have the same network structure but different weight parameters and respectively.