Skip to main content
Indian Journal of Psychological Medicine logoLink to Indian Journal of Psychological Medicine
. 2022 Nov 22;45(1):89–90. doi: 10.1177/02537176221139665

Understanding Statistical Noise in Research: 1. Basic Concepts

Chittaranjan Andrade 1,
PMCID: PMC9896112  PMID: 36778609

Abstract

The signal is the outcome of interest in a study; it may be the value of a variable or it may be the value of a relationship between variables. Signals in research are distorted by statistical noise. This statistical noise is generated by extraneous variables that may be adequately measured, inadequately measured, unmeasured, or unknown; the subject-to-subject variation in the signal resulting from the effects of these extraneous variables is captured by the standard deviation. Thus, the standard deviation is a measure of statistical noise. This article, the first in a series, explains all of these concepts with the help of examples.

Keywords: Signal, noise, mean, standard deviation, unmeasured confounds, unknown confounds


This is the first article in a series that discusses the concept of signal and noise in research. In a hypothetical example, I deliver a lecture to a large class of postgraduate students. After the lecture, I administer a multiple choice question (MCQ) test to assess how much of my lecture the students have understood. In an ideal world, I would expect that whatever I have explained well would be well understood by all and correctly answered by all, and whatever I have explained poorly would be poorly understood by all and incorrectly answered by all. In other words, I would expect everybody to perform identically on the test. However, when I score the test, unsurprisingly, I discover that some students did well and some did poorly; the rest had scores lying in between. That is, there was much variation in performance.

Signal and Noise: Basic Concepts

In this example, the effectiveness of my teaching, operationalized as the MCQ score, is the signal that I want to identify. However, in each student this signal is distorted by a range of variables; the distortion is called noise (see Appendix). Consider: some students may already be knowledgeable on the subject, or more intelligent, and may perform better than expected. Some students may have been sleepy or distracted during the lecture and may perform worse than expected. Other sources of noise in the MCQ scores include how well students could hear my voice, how fluent students were in the language in which I lectured, how interested students were in the subject of the lecture, and whether students had other state or trait factors that influenced attention, concentration, and understanding of the contents of the lecture.

Listed above are variables that can obviously influence performance; that is, they are known sources of noise. We can measure some of these variables, such as prior knowledge and intelligence, with reasonable reliability and validity. We can measure other variables, such as the extent to which students self-rated distraction during the lecture, with less accuracy. Still other variables may not be measurable, such as the ability of the student to reason and guess when the answer to an MCQ question is not known. Finally, there may be variables that influence MCQ test performance that we have not even thought of. In other words, in research, there are adequately measured, inadequately measured, unmeasured, and unknown variables all of which are capable of producing noise that blurs the value of the signal (see Appendix).

In the example provided, the mean MCQ score (as a measure of central tendency) defines the signal of interest in the sample. The standard deviation (SD), which is the average distance of each student’s MCQ score from the mean, is a measure of the statistical noise. Variance, which is SD2, is therefore also a measure of noise (Appendix).

To recapitulate, for any (continuous) variable, the mean is the signal and the variance is the noise. This variance arises from adequately measured, inadequately measured, unmeasured, and unknown factors. It is important to note that identifying a signal through noise has nothing to do with estimating the population mean using the sample mean. Signal and noise and population mean vs sample mean are unrelated constructs. Mean and SD exist in the population much as they exist in a sample. That is, extraneous variables create noise that blurs the value of a signal in the population just as it does in a sample.

Signal and Noise: Relevance to Research

In research, we test hypotheses by examining signals about relationships between variables. Because noise distorts the signal, we do not like noise. Because SDs represent noise, we do not like large SDs; it is harder to identify statistical significance when SDs are large. Because outlying values inflate the SD, we do not like outlying values that skew distributions.

One way of pre-emptively reducing noise is to recruit homogeneous samples. So, for a clinical trial, we may recruit young, adult, drug-naïve patients who do not smoke, drink, or use other substances, who do not have medical or psychiatric comorbidities, and who cross a threshold for baseline illness severity. A plus is that such a clinically and sociodemographically homogeneous sample may have more homogeneous outcomes than a heterogeneous sample, making the signal easier to identify. A minus is that results obtained in such samples cannot be easily generalized to the real world where patients are widely heterogeneous.

The astute reader will now understand why drugs that seem to work well in clinical trials do not work as well in the real world. In clinical trials, homogeneous samples and standardized operating procedures, including standardized treatment protocols, help the signal (if any) become more easily detectable. 1 So, a treatment may be seen to work in ideal patients treated in an ideal way. In the real world, heterogeneity in patient populations and in treatment environments generates variability (noise) that makes the signal (treatment efficacy) more difficult to observe.

The next article in this series will address noise in randomized control trials and in observational studies, and how such noise can be pre-emptively and statistically addressed.

Footnotes

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author received no financial support for the research, authorship, and/or publication of this article.

Appendix

  1. Radio transmission and static classically exemplify signal and noise; however, noise in radio signals blurs the clarity of the signal, whereas noise in research blurs the value of the signal.

  2. In the MCQ test example provided at the beginning of this article, signal and noise are explained in the context of mean and SD for a single normally-distributed continuous variable. The concept of signal and noise also applies to continuous variables with non-normal distributions, to categorical variables, and, importantly, to inferential statistical procedures that examine relationships between variables in research.

  3. Smoking is a well measured variable when it is operationalized into a set of variables that includes age at onset of smoking, years of smoking, number of cigarettes smoked per day, nicotine and tar content of cigarettes, extent of inhalation of smoke, etc. However, when smoking is measured solely as yes or no, it is an inadequately measured variable.

  4. The stress and support that study subjects experience commonly influence research outcomes. Stress and support are excellent examples of commonly unmeasured variables that create noise that blurs the value of a signal in clinical research.

  5. Genes almost certainly influence far more than we realize and understand. So, in many research situations, genetic influences are unknown variables that are a source of noise that blurs the value of a signal.

  6. When noise variables are well measured, their influence can be adjusted for in statistical analysis; the value of the signal then becomes more clear. When variables are inadequately measured, unmeasured, or unknown, they cannot be properly adjusted for, or cannot be adjusted for at all, and so the value of the signal remains inaccurate.

Reference

  • 1.Andrade C. Signal to noise ratio, variability, and their relevance in clinical trials. J Clin Psychiatry 2013; 74: 479–481. [DOI] [PubMed] [Google Scholar]

Articles from Indian Journal of Psychological Medicine are provided here courtesy of Indian Psychiatric Society South Zonal Branch

RESOURCES