Figure 2 |. The feature fallacy.
Different linear encoding models spanning the same space of activity profiles may not be distinguishable. There are many alternative sets of feature vectors {f1 , f2, …} that span the same space of activity profiles. In the absence of a prior on the weights of the linear model, all these sets can equally explain a given set of brain responses. The ambiguity is reduced, but not resolved when a prior on the weights is assumed (Diedrichsen et al. 2018). If we define a prior on the weights, then each model predicts a probability density over the space of activity profiles. This probabilistic prediction may be distinct for two sets of basis features, even if they span the same space. For example, if the weight prior is a 0-mean isotropic Gaussian, then each model assigns probabilities to different activity profiles according to a Gaussian distribution over the space of profiles. Two linear models may span the same space, but predict distinct distributions of activity profiles. However, even with a Gaussian weight prior, there are still (infinitely) many equivalent models that make identical probabilistic predictions. We illustrate this by example. (a) Three models (A, B, C) each contain two feature vectors as predictors (A: {fA1, fA2}, B: {fB1, fB2}, C: {fc1, fc2}). The three models all span the same 2-dimensional space of activity profiles. For each model, we assume a 0-mean isotropic Gaussian weight prior. (b) All three models predict the same nonisotropic Gaussian probability density over the space of activity profiles (indicated by a single iso-probability-density contour: the ellipse). Model A (gray) predicts the density by modeling it with two orthogonal features that capture the principal-component axes, with features having different norms to capture the anisotropy. Model B predicts the same density by modeling with two correlated features of similar norm. Model C falls somewhere in between, combining feature correlation and different feature norms to capture the same Gaussian density over the activity profiles. Note that there are many other models that span the same space, but will not induce the same probability density over activity profiles when complemented by a 0-mean isotropic Gaussian weight prior. A given linear encoding model’s success at predicting brain responses provides evidence for the induced distribution of activity profiles, but not for the particular features chosen to express that distribution.
