. 2017 Jan 11;8(1):63–79. doi: 10.3945/an.116.013144

TABLE 1.

Approaches for evaluating the relative accuracy and reliability of dietary instruments¹

		Cutoffs for assessing relative accuracy and reliability
	Definitions	Poor or failing	Acceptable
Measures of relative accuracy
Omission rate	A measure of reporting error that reflects the rate of foods reported relative to all foods observed. Calculated as the sum of weighted values for omitted foods / (sum of weighted values for matched foods + sum of weighted values for omitted foods). Ranges from 0% to 100%.	>15%	≤15%
Intrusion rate	A measure of reporting error that reflects the rate of foods reported but not observed. Calculated as the sum of weighted values for intrusion² foods / (sum of weighted values for matched foods + sum of weighted values for intrusion foods). Ranges from 0% to 100%.	>15%	≤15%
Match rate	The ratio of foods reported to be consumed over foods that were consumed. Ranges from 0% to 100%.	<85%	≥85%
Arithmetic difference³	The difference (expressed in svgs) between observed and reported amounts for matched foods, but under- and overreports can offset one another. These differences for each meal item are multiplied by a statistical weight⁴ and then summed up for each meal for each child. The sign of the arithmetic difference provides an indication as to whether, on average, children tend to over- or underreport foods and beverages consumed.	<−0.5 or >0.5 svg/meal	−0.5 to 0.5 svg/meal
Absolute difference⁵	The difference (expressed in svgs) between observed and reported amounts for matched foods, but under- and overreports do not cancel each other. These differences for each meal item are multiplied by a statistical weight and then summed up for each child. This represents the average number of svgs misreported in a given meal for a group of children.	>1 svg/meal	≤1 svg/meal
2-Sample t test	A statistical test used to assess the difference between 2 means (obtained from the reference and test instrument).	Significant difference	No significant difference
Energy report rate	The percentage of reported energy consumed divided by the total observed energy consumed and multiplied by 100. The closer to 100%, the more valid the instrument.	<85% or >115%	85–115%
Correlation coefficient	Measures the agreement between individual values between a test and reference method.³	r < 0.6	r ≥ 0.6
Measures of reliability⁶
Cohen’s κ coefficient	Statistical measure of the amount of agreement between 2 measures of the same concept (can be used to assess test-retest or interrater reliability).	<0.6	0.6–0.8: acceptable; >0.8: good
Intraclass correlation coefficient	A statistical test that measures intrarater reliability (how similar the ratings are for a given observer over time).	<0.6	0.6–0.8: acceptable; >0.8: good
Percentage agreement between raters	A statistical test to measure interrater reliability (how different observers rate the same observation).	<60%	60–80%: acceptable; >80%: good

svg, serving.

Intrusion foods are foods that were not observed by raters and observers but reported by children (also sometimes referred to as phantom foods).

Correlation coefficients provide only an indirect measure of accuracy. A correlation coefficient measures the agreement between individual values for 2 methods. For example, the individual values for protein intakes obtained by a 24-h dietary recall and an FFQ designed to measure usual daily protein intake could be highly correlated, but the absolute difference in individual values estimated via each method could substantially differ.

⁴

Subjective weights are used when adding each meal component to generate an omission or intrusion rate for the whole meal: meal entrée = 2, condiments = 0.33, and other meal items = 1 (31–33).

⁵

A limitation of arithmetic and absolute difference measures is that that they are svg based, but svg sizes vary greatly across different types of foods. A 0.1-svg difference in reporting of a food with low energy density, such as lettuce, or a food typically consumed in small portions, such as ketchup, has a very different impact on the accuracy of total reported intake compared with a 0.1-svg of a food that is energy dense, such as pizza or a chocolate bar.

⁶

Cutoffs for defining poor, acceptable, and good test-retest reliability and inter- and intrarater reliability were derived from methodological literature on reliability (34, 35).