TABLE 4.
Rates of correct DIF detection among DIF scenarios.
Setting | DIF form | DIF size | n | J | ROSALI-DIF FORWARD | PCMLasso | |||||
%LRT SIG | Most flexible | Flexible | Perfect | Most flexible | Flexible | Perfect | |||||
1 | H | Weak | 400 | 4 | 31% | 4% | 3% | 2% | 5% | 3% | 0% |
1 | H | Weak | 400 | 7 | 27% | 5% | 4% | 3% | 6% | 3% | 0% |
1 | H | Weak | 800 | 4 | 64% | 20% | 17% | 14% | 16% | 9% | 1% |
1 | H | Weak | 800 | 7 | 63% | 26% | 18% | 14% | 18% | 9% | 0% |
1 | H | Medium | 400 | 4 | 85% | 39% | 34% | 29% | 35% | 17% | 0% |
1 | H | Medium | 400 | 7 | 79% | 44% | 30% | 25% | 39% | 16% | 0% |
1 | H | Medium | 800 | 4 | 99% | 90% | 73% | 67% | 80% | 42% | 1% |
1 | H | Medium | 800 | 7 | 99% | 91% | 66% | 59% | 83% | 46% | 2% |
1 | NH | Weak | 400 | 4 | 33% | 5% | 4% | 1% | 8% | 6% | 5% |
1 | NH | Weak | 400 | 7 | 31% | 4% | 2% | 0% | 6% | 3% | 3% |
1 | NH | Weak | 800 | 4 | 68% | 29% | 25% | 3% | 27% | 18% | 17% |
1 | NH | Weak | 800 | 7 | 62% | 27% | 18% | 2% | 23% | 14% | 12% |
1 | NH | Medium | 400 | 4 | 91% | 56% | 49% | 6% | 49% | 26% | 25% |
1 | NH | Medium | 400 | 7 | 79% | 51% | 35% | 5% | 49% | 24% | 23% |
1 | NH | Medium | 800 | 4 | 100% | 96% | 77% | 31% | 88% | 49% | 49% |
1 | NH | Medium | 800 | 7 | 100% | 96% | 65% | 24% | 90% | 52% | 51% |
2 | H | Weak | 400 | 4 | 29% | 3% | 1% | 1% | 5% | 3% | 0% |
2 | H | Weak | 400 | 7 | 26% | 2% | 2% | 1% | 6% | 2% | 0% |
2 | H | Weak | 800 | 4 | 68% | 23% | 20% | 18% | 17% | 9% | 0% |
2 | H | Weak | 800 | 7 | 58% | 21% | 13% | 10% | 17% | 8% | 0% |
2 | H | Medium | 400 | 4 | 83% | 41% | 35% | 30% | 35% | 15% | 0% |
2 | H | Medium | 400 | 7 | 82% | 41% | 30% | 25% | 43% | 15% | 1% |
2 | H | Medium | 800 | 4 | 99% | 91% | 77% | 68% | 76% | 37% | 1% |
2 | H | Medium | 800 | 7 | 99% | 92% | 65% | 61% | 85% | 46% | 2% |
2 | NH | Weak | 400 | 4 | 34% | 6% | 6% | 1% | 10% | 6% | 6% |
2 | NH | Weak | 400 | 7 | 32% | 7% | 5% | 1% | 9% | 5% | 4% |
2 | NH | Weak | 800 | 4 | 74% | 34% | 32% | 2% | 26% | 16% | 15% |
2 | NH | Weak | 800 | 7 | 67% | 32% | 22% | 2% | 27% | 15% | 13% |
2 | NH | Medium | 400 | 4 | 86% | 50% | 42% | 5% | 47% | 27% | 26% |
2 | NH | Medium | 400 | 7 | 79% | 51% | 39% | 6% | 46% | 22% | 22% |
2 | NH | Medium | 800 | 4 | 100% | 95% | 81% | 26% | 86% | 54% | 53% |
2 | NH | Medium | 800 | 7 | 99% | 96% | 65% | 22% | 89% | 51% | 50% |
3 | H | Weak | 400 | 4 | 21% | 2% | 2% | 1% | 2% | 0% | 0% |
3 | H | Weak | 400 | 7 | 23% | 2% | 2% | 2% | 1% | 1% | 0% |
3 | H | Weak | 800 | 4 | 44% | 7% | 7% | 6% | 2% | 0% | 0% |
3 | H | Weak | 800 | 7 | 44% | 13% | 10% | 8% | 9% | 3% | 0% |
3 | H | Medium | 400 | 4 | 58% | 19% | 18% | 13% | 7% | 2% | 0% |
3 | H | Medium | 400 | 7 | 67% | 26% | 22% | 18% | 20% | 6% | 0% |
3 | H | Medium | 800 | 4 | 91% | 69% | 61% | 54% | 19% | 3% | 0% |
3 | H | Medium | 800 | 7 | 96% | 78% | 62% | 56% | 60% | 21% | 1% |
3 | NH | Weak | 400 | 4 | 24% | 3% | 2% | 0% | 3% | 1% | 1% |
3 | NH | Weak | 400 | 7 | 28% | 2% | 1% | 0% | 4% | 1% | 1% |
3 | NH | Weak | 800 | 4 | 48% | 13% | 11% | 1% | 8% | 4% | 4% |
3 | NH | Weak | 800 | 7 | 52% | 17% | 13% | 2% | 15% | 6% | 6% |
3 | NH | Medium | 400 | 4 | 73% | 31% | 28% | 5% | 22% | 8% | 8% |
3 | NH | Medium | 400 | 7 | 75% | 38% | 31% | 6% | 34% | 16% | 16% |
3 | NH | Medium | 800 | 4 | 98% | 86% | 73% | 25% | 58% | 19% | 19% |
3 | NH | Medium | 800 | 7 | 98% | 85% | 66% | 26% | 78% | 34% | 33% |
%LRT SIG: proportion of datasets with significant likelihood-ratio test, most flexible (%): proportion of datasets where the procedure identified DIF at least on the correct item-covariate pairs (among others), flexible (%): proportion of datasets where the procedure identified DIF on the correct item-covariate pairs only, perfect (%): proportion of datasets were the procedure identified exactly the DIF that was simulated (correct form and correct pairs). Setting No. 1: The two covariates are not correlated and they induce DIF on two distinct items. Setting No. 2: The two covariates are not correlated and they induce DIF on the same item. Setting No. 3: The two covariates are correlated and only one induces DIF on two items. The procedures converged on all datasets. No identifiability issues were encountered. Results are given according to the simulation characteristics: setting, DIF form (homogeneous H, non-homogeneous NH), DIF size, sample size n, number of items J.