. 2023 Jan 25;2(1):e0000171. doi: 10.1371/journal.pdig.0000171

Table 5. Triangulating the validity of Fitbit-derived PA metrics.

	Step count	Time in PA	Time in MVPA
Did criterion measures agree with each other?
Scripted tasks	+	++	++
Free living, Epoch level	--	++	+/++
Free living, Daily level	--	+	+
Free living, Average level	-	-	++
Consistency across severity strata^*	+	+	-
Did criterion measures correlate with each other?
Scripted tasks	++	na	na
Free living, Epoch level	-	na	na
Free living, Daily level	++	++	++
Free living, Average level	++	++	++
Consistency across severity strata^*	+	+	-
Did Fitbit agree with criterion measures?
Scripted tasks	+	++	--
Free living, Epoch level	--	+/++	-/+
Free living, Daily level	-/+	--/-	+
Free living, Average level	+	-/+	+/++
Consistency across severity strata^*	-	-	-
Did Fitbit meet or exceed the agreement exhibited by criterion measures?	+	-/+	-
Did Fitbit correlate with criterion measures?
Scripted tasks	++	na	na
Free living, Epoch level	-	na	na
Free living, Daily level	++	+/++	+
Free living, Average level	++	+/++	+/++
Consistency across severity strata^*	-/+	-	-
Did Fitbit meet or exceed the correlations exhibited by criterion measures?	+	-/+	-
Did Fitbit relate to clinical outcomes?
Did Fitbit exhibit the expected correlations with clinical measures?	+	+	+
Did Fitbit-derived PA metrics differ across known groups?	+	-	+
Did Fitbit meet or exceed the relationships exhibited by criterion measures?	+	+	+
*Validity of Fitbit-derived PA metrics*
Can criterion measures be considered equivalent?	-	-	-
Can Fitbit and Actigraph be considered equivalent?	-	-	-
Did Fitbit exhibit evidence of construct validity?	+	+	+

++: Excellent agreement or strong correlations (0.75–1.0)

+: Fair to good agreement or moderate correlations (0.4–0.75)

-: Poor agreement, weak correlations (0.2–0.4)

—: Very weak or complete lack of agreement or correlation (<0.2)

+/-: Evidence was mixed; Binary yes/no responses are indicated with + or -, respectively

*Consistency across severity strata describes whether the trends observed during scripted tasks and at the epoch, daily, and average levels were consistent across subgroups with mild, moderate, and severe MS. A positive rating in this category does not necessarily mean that measures correlated or agreed with each other.