Skip to main content
. Author manuscript; available in PMC: 2008 Aug 18.
Published in final edited form as: J Am Diet Assoc. 2006 Oct;106(10):1575–1587. doi: 10.1016/j.jada.2006.07.003

Glossary.

Definitions of common statistical terms and their use in the usual food intake model

Statistical Term Definition Use in usual food intake model
Statistical Model A model is a mathematical formula used to quantify the relationship between two or more variables. The model is “statistical” when it also incorporates uncertainty in the relationship between the variables. A statistical model is used to estimate usual food intake and to relate it to other variables of interest.
Two-part model Sometimes there is a need for a more complex statistical model that includes two component parts. In this paper, the model for food consumption models the probability of consuming a food as well as the usual amount consumed
Outcome Variable The variable of interest in the analysis. Sometimes referred to as the dependent variable. Usual food intake is the outcome variable for the two-part model. It is derived as the probability of consuming the food multiplied by the amount consumed on a consumption day.
Covariates Variables that are related to the outcome variable. The term “covariate” is used to describe a general class of variables that may be of most interest, define a subpopulation, need to be adjusted for in the statistical analysis. Sometimes referred to as independent variables. Responses to line items from a food frequency questionnaire may be used as a covariate to improve estimation from the 24HR alone; variables such as age and race may be used to define subpopulations for which usual intake is estimated.
Person-specific random effect A term that is specific to an individual that refers to how an individual's value deviates from the average. It is considered “random” because the individuals in the study are considered as a random sample from a larger population. Both parts of the statistical model include a person-specific random effect that describes the individual's frequency of consuming a particular food and the amount consumed.
Normality Refers to a statistical distribution of a variable, specifically a bell-shaped curve. The tails of the normal distribution refer to the extreme values. If the data do not follow a normal distribution, then many commonly used statistical methods can not appropriately be used. By applying a function, such as the logarithm, data can be transformed to a more normal distribution The amount part of the model is transformed to normality (using a Box Cox (power) transformation). In the model, the normality assumption must hold for the random effects after including the covariates of interest in the model. Including the covariates may help to make the distribution of these random effects more normally distributed.
Correlation If two variables are associated with each other, they are said to be correlated. The opposite of correlation is independence, in which a change in one variable does not impact the value of another variable. In this model, there are two types of correlation. First, the two person-specific effects are correlated. This means that we allow the individual's tendency to consume a food to be related to the amount that he or she consumes. Second, the covariates in the model are correlated with the outcome (food intake). For example, persons who report a higher frequency of intake on the FFQ generally have a higher probability of consuming a food on the 24HR.
Simulation Study A simulation study is a method that statisticians use to validate their models. Many hypothetical random samples are generated (i.e., simulated), and statistical estimates are computed for each sample. The results are then averaged and compared to the “truth” that was used to generate the model. In this paper, simulations were used to generate 365 days of pseudo-data for a series of individuals. Then different statistical methods to obtain estimates of the distribution of usual intake were run using the same generated data sets. Finally, these estimated distributions were compared to truth.

FFQ: Food Frequency Questionnaire

24HR: Twenty-four hour recall