. 2023 Dec 11;13:21913. doi: 10.1038/s41598-023-48892-x

Table 3.

GRADE table: Should COVID-19 self-testing, defined as self-sampling, processing of the sample and self-readout using Ag-RDTs, be offered as an additional approach to professionally administered testing services? The following table summarizes the certainty of evidence according to the GRADE approach.

Certainty assessment							Impact	Certainty	Importance
№ of studies	Study design	Risk of bias	Inconsistency	Indirectness	Imprecision	Other considerations	Impact	Certainty	Importance
Accuracy—sensitivity (Ag-RDT self-testing vs. rRT-PCR)
23^11,24–34	Observational studies	Not serious^a	Not serious^b	Not serious^c	Not serious^d	None	Normalized to a study population with 1000 participants and 10% prevalence, 66 true positive and 34 false negative self-testing results were reported. Pooled sensitivity was 66.1% (95% CI 53.5 to 76.7)	⨁⨁⨁⨁ High	CRITICAL
Accuracy—specificity (Ag-RDT self-testing vs. rRT-PCR)
23^11,24–34	Observational studies	Not serious^a	Not serious^b	Not serious^c	Not serious^d	None	Normalized to a study population with 1000 participants and 10% prevalence, 874 true negative and 2 false positive self-testing results were reported. Pooled specificity was high with 99.5% (95% CI 99.1 to 99.7)	⨁⨁⨁⨁ High	CRITICAL
Accuracy—concordance (Ag-RDT self-testing vs. Ag-RDT performed by professionals)
1¹¹	Observational studies	Not serious^a	Serious^b	Not serious^c	Serious^d	None	Kappa: 0.92 (out of 1.00); (95% CI 0.89 to 0.95)	⨁⨁◯◯ Low	CRITICAL
Accuracy—Proportion of user errors
1¹¹	Observational studies	Not serious^a	Serious^b	Not serious^c	Not serious^e	None	15.5% of the sampling steps and 15.0% of testing steps, were found to have deviations by study participants. However, these did not impede the self-test's performance	⨁⨁◯◯ Low	IMPORTANT

Explanation: ^aWe used QUADAS-2 to assess risk of bias. The studies enrolled patients consecutively and assessed the self-testing, defined as self-sampling and self-performing the Ag-RDT, results blinded to the reference standard result (rRT-PCR or prof. Ag-RDT testing). While for one study it was not clear whether all self-tests were performed as per manufacturer’s instructions, this was ensured in the other. Furthermore, we could not detect any potential bias resulting from the study flow and timing. Therefore, we did not downgrade the quality of evidence for this criterion.

^bThe heterogeneity/inconsistency in findings, as shown by the wide-ranging point estimates with only marginally overlapping confidence intervals, is likely to originate from differences in the study population. This is strengthened by the fact that the head-to-head comparison between self-testing and professionally testing on the same study population shows similar performance of Ag-RDTs. However, as there are only a few studies available for concordance and one study for user errors, we downgrade for these two outcomes by one.

^cFollowing current guidance from the GRADE guideline, we do not downgrade by one point for all studies but acknowledge that the study populations are not fully representative of the populations of interest. Furthermore, the intervention did not differ from the one of interest and outcomes were reported directly, therefore indirectness was judged 'not serious'.

^dThe number of studies and sample size were small, and only one study reported on concordance between self-testing and professionally testing using Ag-RDTs.

^eFor this outcome only qualitative data, or quantitative data in isolated studies in well-described but not comparable settings were available, therefore the criterion 'imprecision' is negligible and rated as 'not serious'.