Skip to main content
Deutsches Ärzteblatt International logoLink to Deutsches Ärzteblatt International
. 2024 Feb 23;121(4):128–134. doi: 10.3238/arztebl.m2023.0278

Regression Analyses and Their Particularities in Observational Studies

Part 32 of a Series on Evaluation of Scientific Publications

Antonia Zapf 1,*, Christian Wiessner 1, Inke Regina König 2
PMCID: PMC11019761  PMID: 38231741

Abstract

Background

Regression analysis is a standard method in medical research. It is often not clear, however, how the individual components of regression models are to be understood and interpreted. In this article, we provide an overview of this type of analysis and discuss its special features when used in observational studies.

Methods

Based on a selective literature review, the individual components of a regression model for differently scaled outcome variables (metric: linear regression; binary: logistic regression; time to event: Cox regression; count variable: Poisson or negative binomial regression) are explained, and their interpretation is illustrated with respect to a study on multiple sclerosis. The prerequisites for the use of each of these models, their applications, and their limitations are described in detail.

Results

Regression analyses are used to quantify the relation between several variables and the outcome variable. In randomized clinical trials, this flexible statistical analysis method is usually lean and prespecified. In observational studies, where there is a need to control for potential confounders, researchers with knowledge of the topic in question must collaborate with experts in statistical modeling to ensure high model quality and avoid errors. Causal diagrams are an increasingly important basis for evaluation. They should be constructed in collaboration and should differentiate between confounders, mediators, and colliders.

Conclusion

Researchers need a basic understanding of regression models so that these models will be well defined and their findings will be fully reported and correctly interpreted.


cme plus+

This article has been certified by the North Rhine Academy for Continuing Medical Education. The CME questions on this article can be found at http://daebl.de/RY95.The closing date for entries is February 22, 2025.

Participation is possible at cme.aerztebatt.de

Regression analyses form an essential part of medical research (1, 2). The basic idea is that an outcome variable is explained by predictor variables. In a cross-sectional study, the aim is generally to show the association of various variables. Due to the lack of a temporal effect, it is not possible to make a statement of causality without further consideration. In a prospective/longitudinal study, the effect of various variables on an outcome variable can be evaluated to allow predictions to be made or to explain associations.

In randomized controlled trials (RCTs), structural equality is created in principle by random allocation to the intervention groups. Accordingly, the models in RCTs are very lean and should primarily include the intervention group, the baseline value (if possible) and key factors that were used to form groups upon randomization (stratification variables) (3). In observational studies, by contrast, a large number of independent variables can and at times must be included in the model.

In this series of articles on the evaluation of scientific publications, an article on linear regression analysis was already published (1). In addition, regression analyses have been discussed in other articles (46).

The aim of our article is to provide in the first part an overview of the components of regression analyses and in the second part to address the particularities of observational studies. The prerequisites, possibilities and limitations of specific regression models are explained and discussed in the appendix. Table 1 provides a glossary of the technical terms used..

Table 1. Glossary of technical terms used in the area of regression analysis, illustrated using the example of the multiple sclerosis (MS) study.

Term Explanation
Outcome variable The outcome variable is the variable of interest which is to be explained. It is used to measure the effects of treatment (success rate) or it is to be prognosticated. Commonly used synonyms include: response variable, dependent variable, and endpoint. In the MS study, the occurrence of a relapse may be used as the outcome variable..
Predictor variable The predictor variable is the variable that is assumed to have an effect on, for example, the success rate of a treatment or the prognosis, or for which an association with the outcome variable is generally assumed. Commonly used synonyms include: independent variable and explanatory variable or—in epidemiological studies—exposure. In the MS study, the intervention is the key predictor variable.
Interaction In an interaction, the effect of a predictor variable on an outcome variable depends on a third variable. In the MS study, it is conceivable that the experimental intervention may be of greater benefit to patients with shorter disease duration compared to patients with longer disease duration.
Confounder A confounder is a variable which has an effect on both treatment/exposure and the outcome variable. Randomization can be used to counteract confounding. In regression analyses, it is possible to adjust for known confounders. In the MS study, the number of relapses in the 12 months prior to study inclusion could be a confounder.
Mediator A mediator is a variable which lies between treatment/exposure and outcome variable. Mediation analyses can uncover the mechanism by which treatment/exposure exerts an effect on the outcome variable. In order to estimate the overall effect of treatment/exposure on an outcome variable, it is not adjusted for mediators in regression models. If the direct effect of a treatment/exposure on an outcome variable is of interest, adjustment for mediators is made. In the MS study, patient adherence could be a mediator.
Collider A collider is a variable which is influenced by both treatment/exposure and the outcome. In regression analyses, one should not adjust for colliders. The number of follow-up visits is a potential collider in the MS study.
Intercept The intercept is the value the outcome variable has when all predictor variables are equal to zero.
Slope parameter The slope parameter (slope coefficient) always refers to a specific predictor variable and indicates by how much the value of the outcome variable changes if the associated predictor variable increases by one unit.

Methods

Based on a selective search of the literature in the PubMed database and with Perplexity.ai, the individual components of a regression model are explained for outcome variables of different scales. Its interpretation is illustrated using a study on multiple sclerosis as an example. The requirements for the use of each of these models, their applications and their limitations are described in detail in the eMethods.

Components of regression analyses

A regression model can be compared to a kit with various possible components. In this section, we will use a fictitious study on multiple sclerosis to illustrate these components.

Fictitious example study

Multiple sclerosis (MS) is a chronic inflammatory neurological disease which can take a relapsing-remitting or a progressive course. It is characterized by symptoms such as paralysis and sensory disturbances caused by demyelination and degradation of nerve fibers. The lesions in the brain can be visualized using magnetic resonance imaging (MRI). For this fictitious study, we conceive that an experimental intervention (E) is to be compared to the standard of care (S). Because the time since diagnosis is a major prognostic factor, randomization is stratified by time since diagnosis (<5 versus ≥ 5 years). This means that patients with more than 5 years and less than 5 years since diagnosis are randomized separately to the two treatment groups. The aim of this approach is to ensure that both intervention groups are equally represented within the two strata.

Outcome variable

The outcome variable is the variable of interest. The scale on which the outcome variable is measured determines the class of regression model to be used. In MS, disease progression can be defined in various ways. For our fictitious study, we use the following outcome variables:

  • The time a patient takes to walk 25 feet (just under ten meters), known as Timed 25-Foot Walk (T25FW) (7). Alternatively, this time can be converted to foot per second; this is what we will work with in the following. In addition, the change from the baseline value (after minus before the intervention) will be used as an outcome variable (8): As this is a metric variable, a linear regression is used.

  • Relapse within 6 months (yes versus no): As this is a dichotomous variable, a logistic regression is used (9).

  • Time to relapse: As this is a time-to-event variable, a Cox regression is used (10).

  • Number of new lesions in the period of 2 years (determined using MRI): As this is a count variable, a Poisson regression is used (11).

In the eMethods, the various regression models (linear, logistic, Cox, and Poisson regressions) are described in detail with assumptions and applied to the example study with comprehensive interpretation of the results

In most cases, one outcome variable is modelled; the resulting regression model is referred to as univariate. If several outcome variables are to be modeled jointly (for example, the T25FW score in the morning and in the evening), the regression model is referred to as multivariate. Multivariate regression is not discussed in more detail in this article; if you are interested, please refer to Zelterman (2015), among others (12). It should be noted here that in the literature the term “multivariate regression” is often used incorrectly to mean “multiple regression” (13).

Predictor variable

The predictor variable is the variable whose effect is to be investigated. In epidemiological studies, the term exposure is generally used. The scale level of predictor variables can take any form. In the case of a single predictor variable, the term univariable regression or simple regression is used; in the case of two or more independent variables, the term multivariable or multiple regression is used. The primary result of a regression analysis relates to the predictor variables. In general, the regression coefficient, the corresponding (typically two-sided 95%) confidence interval and the corresponding p-value for the test that the coefficient equals 0 is reported for each independent variable. Useful details on the procedure and reporting of regression models can be found in the literature (14, 15).

In the eMethods, concrete models are created, regression coefficients, confidence intervals and p-values are interpreted, and the necessary requirements are explained, all using the fictitious example study. Table 2 provides a summary of the results and interpretations of the fictitious study, separately for the various outcome variables. For the sake of completeness, the various intercept results are also given (for explanation see eMethods und the section “Common mistakes”); however, this factor is generally not taken into account in the interpretation and is also not included in Table 2.

Table 2. Interpretation of regression coefficients in dependence of the prepared regression model for the example multiple sclerosis (MS) study.

Outcome variable Scale Regression model Results β [95% CI] Interpretation
ΔT25FW score
(walking speed as change to baseline in feet per second)
Metric Linear regression ● Intercept:
0.14 [–0.10; 0.38]
● Time since diagnosis:
–0.01 [–0.02; 0.04]
● Intervention group:
0.40 [0.17; 0.63]
● Baseline value:
0.05 [0.03; 0.07]
The experimental treatment results in an increase in walking speed which is greater by 0.4 feet per second. A higher baseline value is also associated with a greater improvement—by 0.05 per foot per second compared to baseline. The association between time since diagnosis and the ΔT25FW score is very weak (0.01 foot per second less per year since diagnosis).
Relapse within 6 months
(yes versus no)
Dichotomous Logistic regression ● Intercept:
–0.73 [–1.42; 0.06]
● Intervention group:
–0.41 [–0.84; –0.07] → Odds Ratio (OR) = exp(–0.41) = 0.66 [0.43; 0.93]
● Time since diagnosis:
0.10 [0.02; 0.29] → OR = exp (0.10) = 1.11 [0.98; 1.34]
The experimental treatment results in a 0.66 times lower chance (eMethods) of at least one relapse in the next six months. For each additional year that patients have the disease, the odds of at least one relapse increase by 11%.
Time to relapse Time to event Cox regression ● Intervention group:
–0.74 [-1.14; –0.39] → Hazard Ratio (HR) = exp (–0.74) = 0.48 [0.32; 0.68]
● Time since diagnosis:
0.15 [–0.05; 0.40] → HR = exp (0.15) = 1.16 [0.95; 1.49]
The experimental treatment results in a reduced risk of at least one relapse during the observation period by a factor of 0.48 compared to the control group. For each additional year that patients have the disease, the risk of at least one relapse increases by 16%.
Number of lesions Count variable Poisson regression ● Intercept:
0.16 [–0.04; 0.39]
● Intervention group:
–0.69 [–1.02; –0.19] → Incidence Rate Ratio (IRR) = exp (–0.69) = 0.50 [0.36; 0.83]
● Time since diagnosis:
0.18 [–0.13; 0.51] → IRR = exp (0.18) = 1.20 [0.88; 1.66]
● Baseline value:
0.05 [–0.27; 0.42] → IRR = exp (0.05) = 1.05 [0.76; 1.52]
The experimental treatment results in a 50% reduction in the number of new lesions. For each additional year that patients have the disease, the number of new lesions increases by 20%. For each additional lesion compared to baseline, the number of new lesions increases by 5%.

**For detailed derivation and interpretation see eMethods; T25FW, Timed 25-Foot Walk (time in seconds a patient takes to walk 25 feet [just under 10 meters]);

CI, confidence interval

Figure 1 shows, for the example research question, a scatter plot and the resulting regression line of a linear simple regression of the effect of disease duration (independent variable x, in years) on the change in T25FW score (dependent variable y, ΔT25FW in feet per second): ΔT25FW = 0.137–0.01 years since diagnosis.

Figure 1.

Figure 1

Scatter plot with regression line of a simple regression; simple regression, using the example of the multiple sclerosis (MS) study on the effect of disease duration (predictor variable x, in years) on the T25FW score (outcome variable y, ΔT25FW in feet per second). The Figure was generated using the Python programming language and ChatGPT.

The intercept is the value at which the regression line for x = 0 intersects the y-axis (here 0.137) and the slope parameter is the value by which the regression line rises (or falls) when the value on the x-axis increase by one unit (also known as slope; here –0.01).

Interaction term

An important requirement for regression models is the additivity of the effects of the individual independent variables. In other words, the various independent variables do not influence each other‘s effect on the outcome variable. On the other hand, if a so-called interaction is thought to exist, a suitable interaction term can be included in the model. However, in order to facilitate a better interpretation of the results, it is generally advisable to only consider interactions that are plausible from a content perspective (16). In the example study, it could be assumed that the experimental intervention is effective in patients with relapsing-remitting disease course, but without benefit in patients with progressive multiple sclerosis. In this case, it would be advisable to stratify the randomization by the factor “type of disease course”. In any case, however, this independent variable should be included in the model and the interaction term “intervention group × type of course“ should also be added. From a statistical perspective, the evidence for an interaction is stronger if the corresponding p-value is smaller. If it is concluded that there is a relevant interaction, the analysis should be performed in a stratified manner. In such a case, the intervention effect should be interpreted separately for progressive and relapsing-remitting disease courses.

Dependent measurements

It is common for clinical studies to have several observations per patient: either per point in time (for example, several lesions in the MRI) or over time (multiple study visits). However, there may also be other factors that lead to dependencies in the data. For example, in a study in which practices are assigned to intervention groups (so-called cluster randomization), the data of patients within one practice would be dependent. In order to take such dependencies into account, random effects are introduced into the regression model. This means that individual intercepts (random intercepts) or slope parameters (random slopes) are estimated for the individual units (e.g. patient or center) with several measurements. In the case of multiple measurements, the patient would then be the random effect; in the case of multicenter trials, it would be the center. More details can be found in the literature (17, 18). Examples of alternative approaches to the evaluation of data with dependent measurements include generalized linear models (GLMs) and generalized estimating equations (GEEs) (19, 20).

Particularities of observational studies

Even though randomized controlled trials (RCTs) offer the highest level of evidence, it is not always possible to conduct an RCT. For example, randomization may not be possible or ethically justifiable for certain reasons (e.g. smoking as a risk factor for multiple sclerosis). In our example, it would be conceivable that patients who do not wish to be randomized are included in a registry and observed over the course of their disease. If regression analyses are then performed on the resulting observational data, there are particularities that should be taken into account.

Evaluation of model quality

While in clinical trials, models are generally pre-specified and not selected based on results, the model is at times changed in observational studies if the results require so to achieve the best possible fit of the model. The square of the correlation coefficient is often estimated as part of a linear regression, the so-called coefficient of determination (R2). It describes the proportion of variation in the values of the outcome variable that can be explained by the regression model. Hence, a value close to 1 is considered very good, while a value close to 0 is considered poor. If in our illustrative model a coefficient of determination of 0.63 is estimated, this means that 63% of the variation in the ΔT25FW score can be explained by the model. Examples of other measures of model quality include the adjusted coefficient of determination and Akaike‘s Information Criterion; further measures are described by Moons et al. (14), among others.

Various measures have been proposed in the literature to estimate the quality of other regression models (e.g. Hosmer and Lemeshow [21] and Moons et al. [14]); some of these measures are similar to those used for the linear regression model (e.g. the pseudo coefficient of determination and a prognostic index), while others are derived from measures of diagnostic accuracy (e.g. sensitivity and specificity). These parameters also indicate a better diagnostic accuracy the closer they are to 1.

Good and poor adjustment variables: confounders, colliders, mediators

The key difference between randomized controlled trials and epidemiological observational studies, such as cohort studies, is that statements on causality are only possible to a limited extent, due to potential confounding variables (confounders). While in randomized clinical trials, the receipt of an intervention is determined by chance, confounders frequently influence the intervention/exposure variable and the outcome variable. Here, it must be distinguished between independent variables and confounders.

While independent variables are any variables included as independent variables included as predictor variables in a regression model, confounders are characterized by a directed dependency structure. Confounders are assumed to have a causal influence on exposure/treatment and on the outcome variable (common cause). Under this assumption, confounding can also be eliminated in observational studies by adjusting for all relevant confounders within a regression model; in this way, causal interpretations of regression coefficients are theoretically possible, too (22). However, the prerequisite for this is that all relevant confounders are actually taken into account, a fact that cannot be proven. For this reason, the German Institute for Quality and Efficiency in Health Care (IQWiG, Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen), for example, uses statistical significance tests with null hypotheses above a minimum difference (shifted null hypothesis) for benefit assessments based on non-randomized studies in order to take this residual uncertainty into account (23). Further methods of dealing with confounding have already been covered in this series of articles (stratification [24] and propensity score matching [4]). However, all of these methods are based on the uncertain assumption that confounding can be completely eliminated. For further discussion of whether and under what assumptions causal interpretations from observational studies are possible, please refer to the literature (25, 26). Despite this, it is generally not advisable for a causal interpretation of the regression coefficients, even in observational studies, to include all available independent variables in the regression model.

In this series of articles, a type of bias was already discussed that can arise through the inclusion of variables in multivariable models: the collider bias (27). A further bias for the estimation of the total effect of an independent variable can arise, if adjustments are made for variables that are located in the causal structure between exposure and outcome; such variables are called mediators. These three basic structures can be illustrated in causal diagrams (directed acyclic graphs, DAGs). For our MS research example, our DAG is based on a review by Koch-Henriksen et al. 2021 (28). Apart from the type of treatment, it is important to establish whether the variables “number of relapses during the 12 months before study start “, “adherence” (measured based on the number of tablets taken) and “frequency of follow-up visits“ should be included in a multivariable regression model (Figure 2).

Figure 2.

Figure 2

Three basic structures in causal diagrams, using the multiple sclerosis (MS) study as an example: Confounding, mediator and collider structures and the handling of these predictor variables in regression models.

E = predictor variable, X = exposition/intervention, Y = outcome variable

Confounding structure

It is necessary to adjust for confounders. In our example, the number of relapses in the 12 months prior to study inclusion can influence the decision on the type of treatment and at the same time be a factor influencing MS progression; therefore, we should adjust for this variable in a regression model.

Mediator structure

To estimate the overall effect of an exposure/treatment on an outcome variable, it is important not to adjust for mediators. To investigate the mechanisms of an effect, a mediation analysis is required, separating the overall effect into a direct and an indirect effect (29). In our example, the type of treatment can have an effect on adherence which in turn can have an effect on MS progression. To estimate the overall treatment effect, we should not adjust for the variable adherence as a mediator in a regression model.

Collider structure

It is important not to adjust for colliders. In our example, the type of treatment and the progression of MS can influence how often patients present for follow-up visits. Thus, the frequency of follow-up visits is a collider for which no adjustment should be made in a regression model.

Summary

In our example, therefore, the only adjustment to be made is for the variable “number of relapses during the 12 months before study inclusion“, as this is the only confounding variable. In a real-world setting, it can be assumed that the DAG is considerably more complex than ours.

Ratio of number of cases to number of variables

The sample size has a major impact on the estimation of a regression model.

The more variables are to be included in a multivariable regression model, the larger the sample should be. Rules of thumb can serve as a simple heuristic for estimation. In this regard, the information in the literature is inconsistent; however, the number of cases should be 10 to 20 times as large as the number of model parameters to be estimated (1, 30, 31). The number of observations in the rarer category or the number of events are important for the logistic regression and the Cox regression. In the literature, a value of at least 10 events per variable is given (32, 33). If during a follow-up period, an event has occurred in 20 of 200 persons, for example, the maximum number of independent (predictor) variables would be two in a logistic regression/Cox regression. However, simulation studies have shown that such simple rules of thumb for determining the number of variables do not accurately reflect complex data structures where the correlations of the predictors and the order of magnitude of the regression coefficients should also be taken into account. For further discussion of this topic, please refer to van Smeden et al. and Courvoisier (34, 35), specifically for prediction models with continuous, dichotomous and time-to-event outcome variables to Riley et al. (36, 37).

Common mistakes

There are typical pitfalls in the estimation and subsequent reporting of a regression model.

First, it should be noted that the estimation of effect sizes and the quality of a regression model is based on various assumptions which are outlined in conjunction with the individual models in the eMethods. In practice, these assumptions must always be checked in order to ultimately prove the validity of the findings from a regression model.

Furthermore, the purpose of a regression model may vary. If, for example, the primary goal is to interpret the relationship of individual predictor variables with the outcome variable—adjusted for other predictor variables—, the p-value in particular is often reported. It should be noted that any statement about the significance of each predictor variable represents an individual statistical test. However, the interpretation of several predictor variables poses the problem of multiple testing (38) which requires adjustments of the level of significance or other possible solutions. As a rule, effect estimates and confidence intervals are reported for the predictor variables. However, if prediction is the primary purpose of the model, the parameters of the individual predictor variables are less important than the overall quality of the model.

Finally, the fact that a regression model is just a model adjusted to the data in the best possible way must be borne in mind both during development and interpretation. It is not possible to establish causality from the estimated statistical parameters; this can only be achieved with a suitable study design. If a data-driven model is chosen, i.e. if predictor variables are selected in a way that optimum model quality is achieved, there is a risk of overfitting. This means that both model quality and the association of individual predictor variables are overestimated. Especially in observational studies, such an approach is common practice; thus, it is necessary to validate the resulting models both internally and in independent data (14). Ideally, however, the model is specified a priori, using a hypothesis-led approach.

Conclusion

The aim of this article was to address basic components of regression analyses in observational studies in an introductory manner, using a study on multiple sclerosis as an example. In medical research, regression analysis plays an important role as a statistical evaluation method due to the flexibility of its use. They have therefore been the subject of guidance articles in various journals, including Nature Methods and the British Medical Journal’s Statistics Notes (39, 40). The fact that our article focusses on regression analyses in observational studies highlights the importance of the collaboration between scientists with domain knowledge and modelers. Causal diagrams form an increasingly important basis for the evaluation of observational studies; they should be prepared jointly and distinguish between confounders, mediators and colliders. Beyond the topics covered in our article, please refer to the literature for further discussion of advanced methods, such as the modelling of the functional form of continuous predictor variables or the advantages and disadvantages of various variable selection options (31, 41). In conclusion, it can be stated that basic knowledge of regression models is necessary for the understanding of many scientific papers.

Questions on the article in issue 4/2024:

Regression Analyses and Their Particularities in Observational Studies

The closing date for entries is 22 February 2025. You may select only one answer per question.

Please select the answer that is most appropriate.

Question 1

Which of the following terms is not mentioned in the manuscript as a synonym for the term “outcome variable”?

  1. Response variable

  2. Endpoint

  3. Outcome

  4. Dependent variable

  5. Explanatory variable

Question 2

What type of regression is performed if the outcome variable is a time-to-event variable

  1. Logistic regression

  2. Linear regression

  3. Cox regression

  4. Poisson regression

  5. Negative binomial regression

Question 3

What type of regression is calculated for a dichotomously scaled outcome variable (yes/no)?

  1. Negative binomial regression

  2. Logistic regression

  3. Poisson regression

  4. Cox regression

  5. Linear regression

Question 4

What is the difference between a univariate and a multivariate regression model?

  1. The number of included independent variables

  2. The type of scale of the outcome variable

  3. The number of included patients/subjects

  4. The number of modeled outcome variables

  5. The type of scales of the independent variables

Question 5

What cause for a potential “overfitting” of the regression model is mentioned in the article?

  1. The model was chosen using a data-driven approach.

  2. The model does not includes enough predictor variables.

  3. The data were collected over a very long period of time.

  4. The model was specified a priori.

  5. The model was specified using a hypothesis-led approach.

Question 6

How can the problem of multiple testing in regression models be addressed so that more reliable significance statements can be made?

  1. Inclusion of further predictor variables

  2. Analysis of further outcome variables

  3. Rounding off the p-values

  4. Adding up the p-values to a common total p value

  5. Adjustment of the significance level

Question 7

What heuristic for approximate estimation (rule of thumb) is mentioned in the article to help choosing the sample size for a multivariable regression model?

  1. The number of cases should be equal to the number of model parameters to be estimated.

  2. The number of cases should be 2 to 3 times as high as the number of model parameters to be estimated.

  3. The number of cases should be 10 to 20 times as high as the number of model parameters to be estimated.

  4. The number of cases should be 100 to 500 times as high as the number of model parameters to be estimated.

  5. The number of cases should be adjusted over the course of the analysis until significant results are obtained.

Question 8

What is the term for variables that lie in the causal structure between exposure and outcome?

  1. Colliders

  2. Mediators

  3. Aggravators

  4. Contradictors

  5. Disruptors

Question 9

In a linear regression, what does an R2 of 0.98 indicate?

  1. A high model quality of the regression model

  2. A large effect size of the predictor variable under consideration

  3. A small effect size of the predictor variable under consideration

  4. A limited explanation of the variation in the values of the outcome variable by the model

  5. A moderate effect size of the variable under consideration

Question 10

In the article, a fictitious study on an experimental intervention for multiple sclerosis (MS) is described. Which variable is mentioned there as an example of a confounder?

  1. The treatment adherence of the patients

  2. The frequency of follow-up visits

  3. The progression of MS

  4. The number of relapses in the 12 months prior to study inclusion

  5. The patient satisfaction after the end of the intervention

Acknowledgments

Translated from the original German by Ralf Thoene, M.D.

Footnotes

Conflict of interest statement

The authors declare no conflict of interest exists.

References

  • 1.Schneider A, Hommel G, Blettner M. Linear regression analysis: part 14 of a series on evaluation of scientific publications. Dtsch Arztebl Int. 2010;107:776–782. doi: 10.3238/arztebl.2010.0776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Katz MH. Cambride, UK: Cambridge university press; 2011. Multivariable analysis: a practical guide for clinicians and public health researchers. [Google Scholar]
  • 3.EMA. Guideline on adjustment for baseline covariates in clinical trials. EMA/CHMP/295050/2013. www.ema.europa.eu/en/documents/scientific-guideline/guideline-adjustment-baseline-covariates-clinical-trials_en.pdf (last accessed on 22 September 2023) 2015 [Google Scholar]
  • 4.Kuss O, Blettner M, Börgermann J. Propensity score: an alternative method of analyzing treatment effects. Dtsch Arztebl Int. 2016;113:597–603. doi: 10.3238/arztebl.2016.0597. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ressing M, Blettner M, Klug SJ. [Data analysis of epidemiological studies - part 11 of a series on evaluation of scientific publications] DZZ. 2011;66:456–462. doi: 10.3238/arztebl.2010.0187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Zwiener I, Blettner M, Hommel G. Survival analysis: part 15 of a series on evaluation of scientific publications. Dtsch Arztebl Int. 2011;108:163–169. doi: 10.3238/arztebl.2011.0163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kalinowski A, Cutter G, Bozinov N, et al. The timed 25-foot walk in a large cohort of multiple sclerosis patients. Mult Scler. 2022;28:289–299. doi: 10.1177/13524585211017013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Goodman AD, Brown TR, Krupp LB, et al. Sustained-release oral fampridine in multiple sclerosis: a randomised, double-blind, controlled trial. Lancet. 2009;373:732–738. doi: 10.1016/S0140-6736(09)60442-6. [DOI] [PubMed] [Google Scholar]
  • 9.Cree BAC, Hollenbach JA, et al. University of California, San Francisco MS-EPIC Team. Silent progression in disease activity-free relapsing multiple sclerosis. Ann Neurol. 2019;85:653–666. doi: 10.1002/ana.25463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Healy BC, Glanz BI, Stankiewicz J, Buckle G, Weiner H, Chitnis T. A method for evaluating treatment switching criteria in multiple sclerosis. Mult Scler. 2010;16:1483–1489. doi: 10.1177/1352458510379245. [DOI] [PubMed] [Google Scholar]
  • 11.Pongratz V, Bussas M, Schmidt P, et al. Lesion location across diagnostic regions in multiple sclerosis. Neuroimage Clin. 2023;37 doi: 10.1016/j.nicl.2022.103311. 103311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zelterman D. Cham, Switzerland: Springer; 2015. Applied multivariate statistics with R. [Google Scholar]
  • 13.Hidalgo B, Goodman M. Multivariate or multivariable regression? Am J Public Health. 2013;103:39–40. doi: 10.2105/AJPH.2012.300897. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Moons KGM, Altman DG, Reitsma JB, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162 doi: 10.7326/M14-0698. W1-73. [DOI] [PubMed] [Google Scholar]
  • 15.Vandenbroucke JP, von Elm E, Altman DG, et al. Strengthening the reporting of observational studies in epidemiology (STROBE): explanation and elaboration. Int J Surg. 2014;12:1500–1524. doi: 10.1016/j.ijsu.2014.07.014. [DOI] [PubMed] [Google Scholar]
  • 16.Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15:361–387. doi: 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4. [DOI] [PubMed] [Google Scholar]
  • 17.Murray DM, Varnell SP, Blitstein JL. Design and analysis of group-randomized trials: a review of recent methodological developments. Am J Public Health. 2004;94:423–432. doi: 10.2105/ajph.94.3.423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Detry MA, Ma Y. Analyzing repeated measurements using mixed models. JAMA. 2016;315:407–408. doi: 10.1001/jama.2015.19394. [DOI] [PubMed] [Google Scholar]
  • 19.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. [Google Scholar]
  • 20.Bender R. Introduction to the use of regression models in epidemiology. Methods Mol Biol. 2009;471:179–195. doi: 10.1007/978-1-59745-416-2_9. [DOI] [PubMed] [Google Scholar]
  • 21.Hosmer DW, Jr., Lemeshow S, Sturdivant RX. John Wiley & Sons; 2013. Applied logistic regression. [Google Scholar]
  • 22.Hernán MA, Robins JM. Boca Raton, FL, USA: Chapman & Hall/CRC; 2020. Causal inference: what if. [Google Scholar]
  • 23.IQWiG. Konzepte zur Generierung versorgungsnaher Daten und deren Auswertung zum Zwecke der Nutzenbewertung von Arzneimitteln nach § 35a SGB V. Rapid Report. 2020 A19-43. [Google Scholar]
  • 24.Röhrig B, du Prel J-B, Wachtlin D, Blettner M. Types of study in medical research: part 3 of a series on evaluation of scientific publications. Dtsch Arztebl Int. 2009;106:262–268. doi: 10.3238/arztebl.2009.0262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Pearl J. Causal inference in statistics: an overview. Statist Surv. 2009;3:96–146. [Google Scholar]
  • 26.Hernán MA. A definition of causal effect for epidemiological research. J Epidemiol Community Health. 2004;58:265–271. doi: 10.1136/jech.2002.006361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Tönnies T, Kahl S, Kuss O. Collider bias in observational studies. Dtsch Arztebl Int. 2022;119:107–122. doi: 10.3238/arztebl.m2022.0076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Koch-Henriksen N, Sørensen PS, Magyari M. Relapses add to permanent disability in relapsing multiple sclerosis patients. Mult Scler Relat Disord. 2021;53 doi: 10.1016/j.msard.2021.103029. 103029. [DOI] [PubMed] [Google Scholar]
  • 29.Tönnies T, Schlesinger S, Lang A, Kuss O. Mediation analysis in medical research. Dtsch Arztebl Int. 2023;120:681–687. doi: 10.3238/arztebl.m2023.0175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Bender R, Ziegler A, Lange S. Multiple Regression. Dtsch Med Wochenschr. 2007;132:e30–e32. doi: 10.1055/s-2007-959036. [DOI] [PubMed] [Google Scholar]
  • 31.Harrell FE. New York, NY: Springer; 2015. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. (Springer Series in Statistics). [Google Scholar]
  • 32.Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49:1373–1379. doi: 10.1016/s0895-4356(96)00236-3. [DOI] [PubMed] [Google Scholar]
  • 33.Peduzzi P, Concato J, Feinstein AR, Holford TR. Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epidemiol. 1995;48:1503–1510. doi: 10.1016/0895-4356(95)00048-8. [DOI] [PubMed] [Google Scholar]
  • 34.van Smeden M, de Groot JAH, Moons KGM, et al. No rationale for 1 variable per 10 events criterion for binary logistic regression analysis. BMC Med Res Methodol. 2016;16 doi: 10.1186/s12874-016-0267-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Courvoisier DS, Combescure C, Agoritsas T, Gayet-Ageron A, Perneger TV. Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure. J Clin Epidemiol. 2011;64:993–1000. doi: 10.1016/j.jclinepi.2010.11.012. [DOI] [PubMed] [Google Scholar]
  • 36.Riley RD, Snell KIE, Ensor J, et al. Minimum sample size for developing a multivariable prediction model: part I—continuous outcomes. Stat Med. 2019;38:1262–1275. doi: 10.1002/sim.7993. [DOI] [PubMed] [Google Scholar]
  • 37.Riley RD, Snell KI, Ensor J, et al. Minimum sample size for developing a multivariable prediction model: PART II—binary and time-to-event outcomes. Stat Med. 2019;38:1276–1296. doi: 10.1002/sim.7992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Bender R, Lange S, Ziegler A. Multiples Testen—Artikel Nr. 12 der Statistik-Serie in der DMW. Dtsch med Wochenschr. 2002;127 T 4-7. [Google Scholar]
  • 39.Krzywinski M, Altman N. Multiple linear regression. Nat Methods. 2015;12:1103–1104. doi: 10.1038/nmeth.3665. [DOI] [PubMed] [Google Scholar]
  • 40.Bland JM, Altman DG. Correlation, regression, and repeated data. BMJ. 1994;308 doi: 10.1136/bmj.308.6933.896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Heinze G, Dunkler D. Five myths about variable selection. Transpl Int. 2017;30:6–10. doi: 10.1111/tri.12895. [DOI] [PubMed] [Google Scholar]
  • E1.Lange S, Bender R. (Lineare) Regression/Korrelation. Dtsch Med Wochenschr. 2001;126 doi: 10.1055/s-2007-959028. T 33-5. [DOI] [PubMed] [Google Scholar]
  • E2.Bender R, Ziegler A, Lange S. Logistische Regression—Artikel Nr. 14 der Statistik-Serie in der DMW. Dtsch med Wochenschr. 2002;127 T 11-3. [Google Scholar]
  • E3.Kleinbaum DG, Klein M. 3rd edition. New York, USA: Springer; 2010. Logistic regression: a self-learning text. [Google Scholar]
  • E4.Ziegler A, Lange S, Bender R. Überlebenszeitanalyse: Eigenschaften und Kaplan-Meier Methode—Artikel Nr. 15 der Statistik-Serie in der DMW. Dtsch med Wochenschr. 2002;127 doi: 10.1055/s-2007-959038. T 14-6. [DOI] [PubMed] [Google Scholar]
  • E5.Kleinbaum DG, Klein M. New York: Springer; 2011. Survival analysis a self-learning text. [Google Scholar]
  • E6.Ziegler A, Lange S, Bender R. Überlebenszeitanalyse: Die Cox-Regression. Dtsch Med Wochenschr. 2004;129 doi: 10.1055/s-2007-959039. T1-3. [DOI] [PubMed] [Google Scholar]
  • E7.Coxe S, West SG, Aiken LS. The analysis of count data: a gentle introduction to poisson regression and its alternatives. J Pers Assess. 2009;91:121–136. doi: 10.1080/00223890802634175. [DOI] [PubMed] [Google Scholar]

Articles from Deutsches Ärzteblatt International are provided here courtesy of Deutscher Arzte-Verlag GmbH

RESOURCES