Speech envelope reconstruction: (A) Mean MEG time courses were extracted from 102 surface ROIs. Backward modeling of the speech envelope from the MEG source time courses was performed using a linear decoder and a leave-one-out cross validation. Fisher z-transformed Pearson’s correlation coefficients between the original and the reconstructed speech envelope served as measure of reconstruction accuracy. (B) Mean speech envelope reconstruction accuracy (±standard error of the mean) as a function of stream and relative time lag between sound input and MEG response. Gray boxes highlight early (15–80 ms), intermediate (90–175 ms), and late (250–450 ms) time windows of interest. The time windows were chosen based on peaks in envelope reconstruction accuracy obtained for the single-speaker condition. (C) Relationship between the individual mean speech envelope reconstruction accuracy, averaged from 0 to 500 ms, and musical training, auditory working memory scores, and FFR power. Black lines show statistically significant robust regression slopes obtained for the to-be-ignored speech stream. (D) Ratio between individual estimates of speech reconstruction accuracy for the to-be-ignored and to-be-attended speech stream as a function of musical training and auditory working memory. The relative representation strength of the to-be-ignored stream increased with an increasing duration of musical training and higher working memory scores. (E) ROIs showing a robust tracking of the speech envelope in the single-speaker condition as well as for the to-be-attended and the to-be-ignored stream during selective listening (PFDR < 0.05, one-tailed).