---
title: "MAPPIT-C: Self-Efficacy"
date:  "`r format(Sys.time(), '%Y-%B-%d %H:%M')`"
output: 
  html_document:
    toc: true
    toc_float: true
    number_sections: true
---

```{r, echo=FALSE, message=FALSE, warning=FALSE}
library(knitr)
opts_chunk$set(echo=FALSE, message=FALSE, warning=FALSE,
               fig.width = 6, fig.height = 5)
options(width = 150)

library(stringr)
library(tibble)
library(tidyr)
library(purrr)
library(dplyr)
library(ggplot2)
library(pander)
library(metafor)
library(clubSandwich)
source("meta-regression-function.R")


dat <- readRDS("data/Self efficacy data.rds")

pop_covariates <-
  dat %>% 
  select(studyID, age_M, female_pct) %>%
  distinct()

pop_means <- 
  pop_covariates %>% 
  summarise(
    age = mean(age_M, na.rm = TRUE),
    female = mean(female_pct, na.rm = TRUE)
  )

dat <- 
  dat %>%
  mutate(
    age_M_cent = age_M - pop_means$age,
    female_pct_cent = female_pct - pop_means$female
  )

```

# Overall effects

```{r}

sample_size_summary <-
  dat %>%
  group_by(studyID) %>%
  summarise(effects = n(),
            participants = max(n_post)) %>%
  summarise(studies = n(),
            min_effects = min(effects),
            median_effects = median(effects),
            max_effects = max(effects),
            effects = sum(effects),
            participants = sum(participants))
```

The analysis is based on a sample of `r sample_size_summary$studies` studies and `r sample_size_summary$effects` effect size estimates, with a combined sample size of `r sample_size_summary$participants` participants. Each study contributed between `r sample_size_summary$min_effects` and  `r sample_size_summary$max_effects` effect size estimates, with a median of  `r sample_size_summary$median_effects` effect sizes per study.


## Sample size

The following plot displays the distribution of study sample sizes (at post-test). 

```{r, fig.height = 3, fig.width = 6}

ggplot(dat, aes(n_post)) + 
  geom_density(fill = "purple") + 
  geom_blank(aes(x = 0, y = 0)) + 
  theme_minimal() + 
  labs(x = "Total sample size", y = "", title = "Empirical distribution of sample sizes")
```

The following table reports the effect size estimates based on samples of 200 participants or more:

```{r}
dat %>%
  filter(n_post >= 200) %>%
  arrange(studyID, weeks_from_baseline) %>%
  select(studyID, outcome, weeks_from_baseline, n_post, g, SE) %>%
  as.data.frame() %>%
  pander(split.tables = 150, round = 3)
```

## Effect size distribution

The following plot displays the distribution of effect size estimates. 

```{r, fig.height = 3, fig.width = 6}
qrtls <- quantile(dat$g, c(.25, .75))
fences <-  qrtls + 3 * diff(qrtls) * c(-1, 1)
fence_dat <- data.frame(qrtl = qrtls, fence = fences)

ggplot(dat, aes(g)) + 
  geom_density(fill = "purple") + 
  geom_blank(aes(x = 0, y = 0)) + 
  geom_vline(data = fence_dat, aes(xintercept = qrtl), linetype = "solid") + 
  geom_vline(data = fence_dat, aes(xintercept = fence), linetype = "dashed") + 
  theme_minimal() + 
  labs(x = "Effect size estimate", y = "", title = "Empirical distribution of effect size estimates (unweighted)")

n_outliers <- sum(dat$g < fences[1] | dat$g > fences[2])

```

Several positive outliers are apparent. We used Tukey's (1977) definition of outliers as values below the 1st quartile minus 3 times the inter-quartile range ($d = `r round(fences[1], 2)`$) or above the 3rd quartile plus 3 times the interquartile range ($d = `r round(fences[2], 2)`$). These thresholds are displayed as dashed lines in the density plot. A total of `r n_outliers` outlying effect size estimates were re-coded (i.e., Windsorized) to the corresponding lower or upper fence values. The following table reports the original and trimmed values of these effect size estimates:

```{r}
dat %>%
  filter(g < fences[1] | g > fences[2]) %>%
  arrange(studyID, weeks_from_baseline) %>%
  mutate(g_trimmed = pmin(pmax(g, fences[1]), fences[2])) %>%
  select(studyID, outcome, weeks_from_baseline, g_original = g, g_trimmed, SE) %>%
  as.data.frame() %>%
  pander(split.tables = 150, round = 3)

dat <- 
  dat %>%
  mutate(g = pmin(pmax(g, fences[1]), fences[2])) %>%
  arrange(studyID, outcome)
```

## Summary meta-analysis

```{r}

# overall 

robu_overall <- robu_fit(dat = dat, "g ~ 1", studyID = "studyID", V = "Vg", rho = 0.7, loo = TRUE)
robu_overall

report_overall <- tidy_overall(robu_overall)
```

Across all included studies, the overall average effect size was estimated as `r round(robu_overall$b, 3)`, 95% CI [`r round(robu_overall$ci.lb, 3)`, `r round(robu_overall$ci.ub, 3)`]. The between-study standard deviation of effects was estimated as $\tau = `r round(sqrt(robu_overall$sigma2), 3)`$, indicating moderate-to-large heterogeneity $(I^2 = `r round(robu_overall$I_sq)`\%)$. One way of characterizing the degree of variability is to consider that, if the true effects are normally distributed then about 2/3 of them should be within +/- 1 SD of the mean effect. So according to our estimates, 2/3 of true effects should be between `r with(robu_overall, round(b - sqrt(sigma2), 3))` and `r with(robu_overall, round(b + sqrt(sigma2), 3))`. Thus, one should expect substantial variation in the effects of psycho-social interventions on self-efficacy outcomes among cancer patients. 

## Correlated effects sensitivity analysis

The above results are based on assumed correlation of .7 among effect size estimates drawn from the same study. Hedges, Tipton, and Johnson (2010) recommend conducting sensitivity analysis for varying values of this correlation. The table below reports the average effect size estimate, 95% confidence interval, and $\hat\tau$ estimate for assumed correlations varying from .0 to .9.

```{r}
rho <- seq(0, 0.9, 0.1)
names(rho) <- rho
f <- function(r) tidy(robu_fit(dat = dat, "g ~ 1", studyID = "studyID", V = "Vg", rho = r))

map_dfr(rho, f, .id = "rho") %>%
  select(-Coef) %>%
  as.data.frame() %>%
  pander()
```

## Outlier sensitivity analysis

We conducted a leave-one-out sensitivity analysis in order to assess the extent to which the estimates of the overall average effect size and between-study variation were affected by the exclusion of a single study. The following figure plots the overall average effect size estimate (left column) and estimated between-study SD (right column) computed after excluding one study from the analysis. The vertical blue lines indicate the estimates based on the full set of `r sample_size_summary$studies` studies. 

```{r, fig.width = 6, fig.height = 12}
loo_dat <-
  loo_summary(robu_overall) %>%
  select(excluded, b, CI_L, CI_U, tau) %>%
  gather("est","val", b, tau) %>%
  mutate(
    CI_L = ifelse(est == "b", CI_L, NA),
    CI_U = ifelse(est == "b", CI_U, NA),
    est = factor(est, levels = c("b","tau"), labels = c("Average Effect Size", "Between-Study SD"))
  )

overall_loo <- filter(loo_dat, excluded=="none")
loo_dat <- filter(loo_dat, excluded != "none") %>% droplevels()

ggplot(loo_dat, aes(x = excluded, y = val)) + 
  geom_point() + 
  geom_segment(aes(y = CI_L, yend = CI_U, x = excluded, xend = excluded)) + 
  geom_hline(data = overall_loo, aes(yintercept = val), color = "blue") + 
  facet_wrap(~ est, scales = "free_x") + 
  scale_x_discrete(limits = rev(levels(loo_dat$excluded))) + 
  coord_flip() + 
  expand_limits(y = 0) + 
  theme_light() + 
  labs(x = "Excluded study", y = "") + 
  theme(strip.text = element_text(color = "black"))
```

Excluding Reif (2013) has a moderate influence on the overall summary results, lowering the average effect size estimate to `r loo_dat %>% filter(excluded == "Reif 2013", est == "Average Effect Size") %>% pull(val) %>% formatC(digits = 3, format = "f")` and reducing estimated between-study heterogeneity to `r loo_dat %>% filter(excluded == "Reif 2013", est == "Between-Study SD") %>% pull(val) %>% formatC(digits = 3, format = "f")`.

# Risk of bias 

## Attrition

```{r}
attrition_dat <- 
  dat %>%
  mutate(
    attrition_effect_level = 100 * (1 - n_post / n_pre),
    ROB_attrition = cut(attrition_effect_level, 
                        breaks = c(0, 10, 20, 100), 
                        labels = c("Low","Unclear","High"),
                        include.lowest = TRUE)
  )
```

Attrition was calculated as the difference in total sample size between baseline and a given follow-up point, divided by the total sample size at baseline. The figure below depicts the distribution of attrition levels across the `r sample_size_summary$effects` included effect sizes.

```{r, fig.width = 5, fig.height = 3}
ggplot(attrition_dat, aes(x = attrition_effect_level)) + 
  geom_histogram(binwidth = 2, boundary = 0) + 
  theme_minimal() + 
  labs(x = "Overall attrition (%)")
```

```{r, include = FALSE}
# The following effect size estimates involved attrition of greater than 40%:

attrition_dat %>%
  filter(attrition_effect_level > 40) %>%
  arrange(studyID, weeks_from_baseline) %>%
  select(studyID, outcome, weeks_from_baseline, n_pre, n_post, `attrition (%)` = attrition_effect_level, g, SE) %>%
  as.data.frame() %>%
  pander(split.tables = 150, round = 3)

```

```{r}
attrition_fit <- robu_fit(dat = attrition_dat, "g ~ attrition_effect_level", studyID = "studyID", V = "Vg")
```

The figure below plots effect size estimates versus overall attrition. The grey line represents the expected average effect size as a linear function of the degree of overall attrition. Based on this linear meta-regression model, higher overall attrition was associated with _lower_ average effect size estimates, although the effect is not statistically distinguishable from zero at the 5% level ($\beta = `r round(attrition_fit$b[[2]], 2)`$, 95% CI: [`r round(attrition_fit$ci.lb[[2]], 2)`, `r round(attrition_fit$ci.ub[[2]], 2)`]).

```{r}
attrition_dat$g_pred <- attrition_fit$b[[1]] + attrition_fit$b[[2]] * attrition_dat$attrition_effect_level
  
ggplot(attrition_dat, aes(attrition_effect_level)) + 
  geom_point(aes(y = g, size = 1 / SE, color = studyID)) + 
  geom_line(aes(y = g_pred), size = 1, color = "grey") + 
  theme_minimal() + 
  theme(legend.position = "none") + 
  labs(x = "Overall attrition (%)", y = "Effect size")
```

```{r}
attrition_fit
```

In analysis at the level of the study (aggregating across effect size estimates), we categorize studie as having low risk of bias if they include at least one effect size estimate with attrition of less than 10%, unclear risk of bias if they include at least one effect size estimate with attrition of less than 20%, and high risk of bias otherwise. 

## Other dimensions of risk-of-bias

The following table reports the number of studies by rating for each of four dimensions of risk-of-bias. 

```{r}
ROB_dat <- 
  attrition_dat %>%
  group_by(studyID) %>%
  summarise(ROB_attrition = min(as.numeric(ROB_attrition))) %>%
  mutate(
    ROB_attrition = factor(ROB_attrition, levels = 1:3, labels = c("Low","Unclear","High"))
  )

ROB_cats <- c("ROB_seq_gen_clean", "ROB_alloc_clean", "ROB_attrition", "ROB_outcome_reporting_clean")
ROB_labels <- c("Sequence generation", "Allocation concealment","Attrition", "Outcome reporting")
 
dat %>%
  group_by(studyID) %>%
  summarise_at(vars(ROB_cats[-3]), unique) %>%
  left_join(ROB_dat, by = "studyID") %>%
  mutate(ROB_attrition = as.character(ROB_attrition)) %>%
  gather("category", "rating", starts_with("ROB")) %>%
  mutate(category = factor(category, levels = ROB_cats, labels = ROB_labels)) %>%
  group_by(category, rating) %>%
  count() %>%
  spread(rating, n) %>%
  select(`ROB category` = category, Low, Unclear, High) %>%
  as.data.frame() %>%
  pander()

```

The following table reports the average effect size estimates for studies rated as low risk-of-bias versus studies rated as unclear or high risk-of-bias, for each of the four dimensions.  The difference between average effects for low versus unclear/high is not statistically distinguishable from zero for any of the dimensions. 

```{r}

test_ROB <- function(data) {
  res <- robu_fit(dat = data, "g ~ 0 + rating", studyID = "studyID", V = "Vg", random = "~ rating | id", struct = "DIAG")
  test <- robu_fit(dat = data, "g ~ rating", studyID = "studyID", V = "Vg", random = "~ rating | id", struct = "DIAG")
  tibble(
    Low_Est = res$b[[1]], Low_SE = res$se[[1]], 
    Unclear_Est = res$b[[2]], Unclear_SE = res$se[[2]],
    Difference = test$b[[2]], Difference_SE = test$se[[2]], Difference_pval = test$pval[[2]]
  )
}
  
ROB_ests <- 
  attrition_dat %>%
  select(studyID, ROB_cats, g, Vg) %>%
  mutate(ROB_attrition = as.character(ROB_attrition)) %>%
  gather("category", "rating", starts_with("ROB")) %>%
  mutate(
    category = factor(category, levels = ROB_cats, labels = ROB_labels),
    rating = ifelse(rating == "Low", "Low", "Unclear/High")
  ) %>%
  group_by(category) %>%
  do(test_ROB(.))
  
ROB_ests %>%
  as.data.frame() %>%
  pander(split.tables = 150, round = 3)

```

## Joint model

Rather than examining each factor separately, we can also consider differences between studies with different risks of bias through a joint meta-regression. Here I used the continuous measure of percentage attrition rather than the threshold values. The model also includes additive effects for the other dimensions of risk of bias, equal to one if the study was at unclear or high risk of bias for a given dimension. The intercept of the model can then be interpreted as the overall average effect size for a study with no attrition and with low risk of bias in all three of the further dimensions.

```{r}
ROB_dat <- 
  attrition_dat %>%
  mutate_at(vars(starts_with("ROB")), function(x) ifelse(x=="Low", "Low", "Unclear/high"))

res_full <- robu_fit(dat = ROB_dat, "g ~ attrition_overall + ROB_seq_gen_clean + ROB_alloc_clean + ROB_outcome_reporting_clean", studyID = "studyID", V = "Vg")
names(res_full$b) <- c("Average effect size - low ROB", "Attrition (%)", 
                       "U/H ROB - sequence generation", "U/H ROB - allocation concealment",
                       "U/H ROB - outcome reporting")
res_full
```

## Successive inclusion criteria

An alternative to joint modeling of risk-of-bias dimensions is to calculate overall average effect size estimates repeatedly, applying successively stronger inclusion criteria at each step. This approach is more of a sensitivity analysis, illustrating how the overall average effect estimate is influenced by the stringency of inclusion critera.

```{r}
overall_fit <- function(data) {
  robu_fit(dat = data, "g ~ 1", studyID = "studyID", V = "Vg") %>%
    tidy_overall()
}

A_dat <- attrition_dat %>% 
  mutate(step = "A", criteria = "All studies")
B_dat <- filter(A_dat, ROB_outcome_reporting_clean == "Low") %>% 
  mutate(step = "B", criteria = "Low ROB for outcome reporting")
C_dat <- filter(B_dat, ROB_seq_gen_clean == "Low") %>% 
  mutate(step = "C", criteria = "+ Low ROB for sequence generation")
D_dat <- filter(C_dat, ROB_alloc_clean == "Low") %>% 
  mutate(step = "D", criteria = "+ Low ROB for allocation concealment")
E_dat <- filter(D_dat, attrition_effect_level < 10) %>% 
  mutate(step = "E", criteria = "+ Overall attrition < 10%")

ROB_inclusion_results <- 
  bind_rows(A_dat, B_dat, C_dat, D_dat, E_dat) %>%
  group_by(step, criteria) %>%
  do(overall_fit(.))

ROB_inclusion_results %>%
  as.data.frame() %>%
  pander(split.tables = 150, round = 3)

```

# Outcome reporting/small sample bias

```{r}
# x_max <- max(sqrt(dat$V_diff)) + 0.01
x_max <- 0.61
y_range <- range(dat$g) + c(-0.1,0.1)

funnel_plot <- 
  dat %>%
  filter(sqrt(V_diff) < 0.6) %>%
  ggplot(aes(sqrt(V_diff), g, size = 1 / V_diff)) + 
  geom_blank(aes(x = 0, y = 0)) + 
  coord_cartesian(xlim = c(0, x_max), ylim = y_range, expand = FALSE) + 
  geom_hline(yintercept = report_overall$Est, color = "black") + 
  geom_abline(intercept = report_overall$Est, slope = qnorm(0.975), color = "black") + 
  geom_abline(intercept = report_overall$Est, slope = -qnorm(0.975), color = "black") + 
  theme_minimal() + 
  theme(legend.position = "bottom") + 
  labs(x = "Scaled Std. Error", y = "Effect size", color = "") + 
  guides(size = FALSE)

funnel_plot + 
  geom_point(aes(color = outcome_group))

```

## Linear in SE

```{r}

# funnel_plot + 
#   geom_point() + 
#     geom_smooth(aes(sqrt(V_diff), g, weight = 1 / Vg), method = "lm", se = FALSE)


robu_EggerA <- robu_fit(dat = filter(dat, sqrt(V_diff) < 0.6),
                        "g ~ sqrt(V_diff)", 
                        studyID = "studyID", V = "Vg", method = "FE")

robu_EggerA

```
Egger's regression test is not statistically significant. By this method, there is not evidence of small-sample biases. 

## Linear in $SE^2$

```{r}

# funnel_plot + 
#   geom_point() + 
#   geom_smooth(aes(sqrt(V_diff), g, weight = 1 / Vg), method = "lm", formula = y ~ I(x^2), se = FALSE)

robu_EggerB <- robu_fit(dat = filter(dat, sqrt(V_diff) < 0.6), 
                        "g ~ V_diff", 
                        studyID = "studyID", V = "Vg", method = "FE")

robu_EggerB

```
A modified (quadratic) version of Egger's regression test yields the same conclusion. 

## Summary of large trials only

```{r}
big_dat <- filter(dat, n_post > 100)
robu_big_studies <- robu_fit(dat = big_dat, "g ~ 1", studyID = "studyID", V = "Vg")
```

Limiting the analytic sample to the `r length(unique(big_dat$studyID))` studies with large samples at post test (post-test total sample size > 100) leads to very similar overall effect size estimates and heterogeneity estimates $(I^2 = `r round(robu_big_studies$I_sq)` \%)$.

```{r}
robu_big_studies
```

# Moderators

```{r}

# follow-up time

dat_followup <- 
  dat %>% 
  filter(!is.na(weeks_from_baseline)) %>%
  mutate(weeks_cent = weeks_from_baseline - 12)

robu_followup <- robu_fit(dat = dat_followup, "g ~ weeks_cent", studyID = "studyID", V = "Vg")
F_followup <- Wald_test(robu_followup, constraints = 2, vcov = "CR2")
F_followup$q <- 1

labs <- tibble(lab = c("Average effect at 12 weeks", "Weeks"),
                   studies = NA,
                   effects = NA)
report_followup <- report_results(robu_followup, labs = labs, title = "Follow-up time", Fstats = F_followup)

# age

robu_age <- robu_fit(dat = dat, "g ~ age_M_cent", studyID = "studyID", V = "Vg")
F_age <- Wald_test(robu_age, constraints = 2, vcov = "CR2")
F_age$q <- 1

labs <- tibble(lab = c("Average", "Age"),
                   studies = NA,
                   effects = NA)
report_age <- report_results(robu_age, labs = labs, title = "Age", Fstats = F_age)

# gender

robu_female <- robu_fit(dat = dat, "g ~ female_pct_cent", studyID = "studyID", V = "Vg")
F_female <- Wald_test(robu_female, constraints = 2, vcov = "CR2")
F_female$q <- 1

labs <- tibble(lab = c("Average","Percent female"),
                   studies = NA,
                   effects = NA)
report_female <- report_results(robu_female, labs = labs, title = "Gender", Fstats = F_female)

# cancer type

cancer_type_summary <- 
  dat %>% 
  filter(type != "not reported") %>%
  group_by(type) %>% 
  summarise(studies = length(unique(studyID)), effects = n()) %>%
  rename(lab = type)

robu_cancer_type <- robu_fit(dat = filter(dat, type != "not reported"),
                             frml = "g ~ 0 + type", 
                             studyID = "studyID", V = "Vg", random = "~ type | id", struct="DIAG")
F_cancer_type <- test_joint(robu_cancer_type, 1:2)

report_cancer_type <- report_results(robu_cancer_type, labs = cancer_type_summary, 
                                     title = "Cancer type", Fstats = F_cancer_type)

# cancer stage

cancer_stage_summary <- 
  dat %>% 
  group_by(stage) %>% 
  summarise(studies = length(unique(studyID)), effects = n()) %>%
  rename(lab = stage)

robu_cancer_stage <- robu_fit(dat = dat, frml = "g ~ 0 + stage",
                              studyID = "studyID", V = "Vg", random = "~ stage | id", struct="DIAG")
F_cancer_stage <- test_joint(robu_cancer_stage, 1:3)

report_cancer_stage <- report_results(robu_cancer_stage, labs = cancer_stage_summary, 
                                     title = "Cancer stage", Fstats = F_cancer_stage)

# cancer phase

cancer_phase_summary <- 
  dat %>% 
  group_by(phase) %>% 
  summarise(studies = length(unique(studyID)), effects = n()) %>%
  rename(lab = phase)

robu_cancer_phase <- robu_fit(dat = dat, frml = "g ~ 0 + phase",
                              studyID = "studyID", V = "Vg", random = "~ phase | id", struct="DIAG")
F_cancer_phase <- test_joint(robu_cancer_phase, 1:3)

report_cancer_phase <- report_results(robu_cancer_phase, labs = cancer_phase_summary, 
                                     title = "Cancer phase", Fstats = F_cancer_phase)

# delivery format

format_summary <- 
  dat %>% 
  group_by(in_person) %>% 
  summarise(studies = length(unique(studyID)), effects = n()) %>%
  rename(lab = in_person)

robu_format <- robu_fit(dat = dat, frml = "g ~ 0 + in_person",
                              studyID = "studyID", V = "Vg", random = "~ in_person | id", struct="DIAG")
F_format <- test_joint(robu_format, 1:2)

report_format <- report_results(robu_format, labs = format_summary, 
                                title = "Delivery format", Fstats = F_format)

# intervention focus

focus_summary <- 
  dat %>% 
  group_by(focus) %>% 
  summarise(studies = length(unique(studyID)), effects = n()) %>%
  rename(lab = focus)

robu_focus <- robu_fit(dat = dat, frml = "g ~ 0 + focus",
                              studyID = "studyID", V = "Vg", random = "~ focus | id", struct="DIAG")
F_focus <- test_joint(robu_focus, 1:3)

report_focus <- report_results(robu_focus, labs = focus_summary, 
                                title = "Intervention focus", Fstats = F_focus)

# intervention class

int_class_summary <- 
  dat %>% 
  group_by(intervention_class) %>% 
  summarise(
    studies = n_distinct(studyID), 
    effects = n()
  ) %>%
  rename(lab = intervention_class)

big_classes <- with(int_class_summary, lab[studies >= 5])
big_class_dat <- filter(dat, intervention_class %in% big_classes)

robu_int_class <- robu_fit(dat = dat, frml = "g ~ 0 + intervention_class",
                              studyID = "studyID", V = "Vg", random = "~ intervention_class | id", struct="DIAG")
F_int_class <- test_joint(robu_int_class, 1:9)

report_int_class <- report_results(robu_int_class, 
                                   labs = int_class_summary, 
                                   title = "Intervention class", Fstats = F_int_class)

# intervention target

target_summary <- 
  dat %>% 
  group_by(intervention_target) %>% 
  summarise(studies = length(unique(studyID)), effects = n()) %>%
  rename(lab = intervention_target)

robu_target <- robu_fit(dat = dat, frml = "g ~ 0 + intervention_target",
                              studyID = "studyID", V = "Vg", random = "~ intervention_target | id", struct="DIAG")
F_target <- test_joint(robu_target, 1:2)

report_target <- report_results(robu_target, labs = target_summary, 
                                title = "Intervention target", Fstats = F_target)

# intervention skills

skills_summary <- 
  dat %>% 
  group_by(skills) %>% 
  summarise(studies = length(unique(studyID)), effects = n()) %>%
  rename(lab = skills)

robu_skills <- robu_fit(dat = dat, frml = "g ~ 0 + skills",
                              studyID = "studyID", V = "Vg", random = "~ skills | id", struct="DIAG")
F_skills <- test_joint(robu_skills, 1:2)

report_skills <- report_results(robu_skills, labs = skills_summary, 
                                title = "Skills intervention", Fstats = F_skills)

# outcome group

outcome_group_summary <- 
  dat %>% 
  group_by(outcome_group) %>% 
  summarise(
    studies = n_distinct(studyID), 
    effects = n()
  ) %>%
  rename(lab = outcome_group)

robu_outcome_group <- robu_fit(dat = dat, frml = "g ~ 0 + outcome_group",
                              studyID = "studyID", V = "Vg", random = "~ outcome_group | id", struct="DIAG")
F_outcome_group <- test_joint(robu_outcome_group, 1:3)

report_outcome_group <- report_results(robu_outcome_group, labs = outcome_group_summary, 
                                title = "Outcome group", Fstats = F_outcome_group)



```

## Summary of moderator results

```{r, asis = TRUE}
report_blank <- tibble(lab = NA, studies = NA, effects = NA, 
                           Est = NA, SE = NA, CI_U = NA, CI_L = NA, 
                           tau = NA, Isq = NA, Fstat = NA, df1 = NA, df2 = NA, p = NA)

reports <- list(
  report_overall, report_blank,
  report_followup, report_blank,
  report_age, report_blank,
  report_female, report_blank,
  report_cancer_type, report_blank,
  report_cancer_stage, report_blank,
  report_cancer_phase, report_blank,
  report_format, report_blank,
  report_focus, report_blank,
  report_int_class, report_blank,
  report_target, report_blank,
  report_skills, report_blank,
  report_outcome_group #, report_blank, report_outcome_cat
)

summary_table <- bind_rows(reports)

summary_rows <- c(1, 1 + cumsum(sapply(reports, nrow))[seq(2,length(reports),2)])

summary_table %>%
  mutate(
    Est = round(Est,3),
    SE = round(SE, 3),
    CI = ifelse(is.na(CI_L), "",paste0("[",round(CI_L, 3),", ", round(CI_U, 3),"]")),
    tau = round(tau, 3),
    Isq = round(Isq, 1),
    Fstat = round(Fstat, 3),
    df2 = round(df2, 1),
    p = format.pval(p, nsmall = 3, digits = 3, eps = 10^-3, na.form = "")
  ) %>%
  select(" " = lab, studies, effects, Est, SE, CI, tau, Isq, F = Fstat, df1, df2, "p-val" = p) %>%
  as.data.frame() %>%
  pander(
    missing = "", 
    split.table = 150, 
    justify = "lrrrrcrrrrrr", 
    emphasize.strong.rows = summary_rows
  )

```

## Follow-up time

The following table reports summary statistics for the follow-up times across included studies.

```{r}
summary(dat$weeks_from_baseline) %>%
  as.list() %>%
  as.data.frame() %>%
  mutate(
    Variable = "Follow-up time (weeks from baseline)",
    Studies = length(unique(dat$studyID)),
    Effects = sum(!is.na(dat$weeks_from_baseline)),
    SD = sd(pop_covariates$age_M, na.rm = TRUE)
  ) %>%
  select(Variable, Studies, Effects, Mean, SD, Min = Min., Q1 = X1st.Qu., Median, Q3 = X3rd.Qu., Max = Max., NAs = NA.s) %>%
  pander(split.table = 150)

```

```{r}

filter(dat, !is.na(weeks_from_baseline)) %>%
ggplot(aes(weeks_from_baseline, g, weight = 1 / Vg, size = 1 / SE)) + 
  geom_point(aes(color = studyID)) + 
  geom_smooth(method = "loess") + 
  theme_minimal() + 
  theme(legend.position = "none") + 
  labs(x = "Weeks from baseline", y = "Effect size")

robu_followup
```

Zoom in on effects for follow-up times between 1 and 30 weeks:

```{r}
filter(dat, weeks_from_baseline >= 1, weeks_from_baseline < 30) %>%
ggplot(aes(weeks_from_baseline, g, weight = 1 / Vg, size = 1 / SE)) + 
  geom_point(aes(color = studyID)) + 
  geom_smooth(method = "loess") + 
  theme_minimal() + 
  theme(legend.position = "none") + 
  labs(x = "Weeks from baseline", y = "Effect size")

robu_fit(dat = filter(dat_followup, weeks_from_baseline >= 1, weeks_from_baseline < 30),
         "g ~ weeks_cent", studyID = "studyID", V = "Vg")
```

## age

The following table reports summary statistics for the mean age of included samples.

```{r}
summary(pop_covariates$age_M) %>%
  as.list() %>%
  as.data.frame() %>%
  mutate(
    Variable = "Sample mean age",
    Studies = sum(!is.na(pop_covariates$age_M)),
    SD = sd(pop_covariates$age_M, na.rm = TRUE)
  ) %>%
  select(Variable, Studies, Mean, SD, Min = Min., Q1 = X1st.Qu., Median, Q3 = X3rd.Qu., Max = Max., NAs = NA.s) %>%
  pander(split.table = 150)

```

```{r}

ggplot(dat, aes(age_M, g, size = 1 / SE, color = studyID)) + 
  geom_point() + 
  theme_minimal() + 
  theme(legend.position = "none") + 
  labs(x = "Sample mean age", y = "Effect size")

robu_age
```

## gender

The following table reports summary statistics for the percentage of females in included samples.

```{r}
summary(pop_covariates$female_pct) %>%
  as.list() %>%
  as.data.frame() %>%
  mutate(
    Variable = "Sample % female",
    Studies = sum(!is.na(pop_covariates$female_pct)),
    SD = sd(pop_covariates$female_pct, na.rm = TRUE)
  ) %>%
  select(Variable, Studies, Mean, SD, Min = Min., Q1 = X1st.Qu., Median, Q3 = X3rd.Qu., Max = Max.) %>%
  pander(split.table = 150)
```

```{r}

ggplot(dat, aes(female_pct, g, size = 1 / SE, color = studyID)) + 
  geom_point() + 
  theme_minimal() + 
  theme(legend.position = "none") + 
  labs(x = "Sample % female", y = "Effect size")

robu_female
```

## Cancer type

```{r}
ggplot(dat, aes(type, g, size = 1 / SE, color = studyID)) + 
  geom_point() + 
  theme_minimal() + 
  theme(legend.position = "none") + 
  labs(x = "Cancer type", y = "Effect size")

robu_cancer_type
```

## Cancer stage

The joint test for differences among cancer stages excludes studies where the stage was not reported.

```{r}
ggplot(dat, aes(stage, g, size = 1 / SE, color = studyID)) + 
  geom_point() + 
  theme_minimal() + 
  theme(legend.position = "none") + 
  labs(x = "Cancer stage", y = "Effect size")

robu_cancer_stage
```

## Cancer phase

The joint test for differences among cancer phases excludes studies where the phase was not reported.

```{r}
ggplot(dat, aes(phase, g, size = 1 / SE, color = studyID)) + 
  geom_point() + 
  theme_minimal() + 
  theme(legend.position = "none") + 
  labs(x = "Cancer phase", y = "Effect size")

robu_cancer_phase
```

## Delivery format

```{r}

ggplot(dat, aes(in_person, g, size = 1 / SE, color = studyID)) + 
  geom_point() + 
  theme_minimal() + 
  theme(legend.position = "none") + 
  labs(x = "Intervention format", y = "Effect size")

robu_format

```

## Intervention focus

```{r}
ggplot(dat, aes(focus, g, size = 1 / SE, color = studyID)) + 
  geom_point() + 
  theme_minimal() + 
  theme(legend.position = "none") + 
  labs(x = "Intervention focus", y = "Effect size")

robu_focus

```

## Intervention class

```{r}

ggplot(dat, aes(intervention_class, g, size = 1 / SE, color = studyID)) + 
  geom_point() + 
  theme_minimal() + 
  theme(legend.position = "none", axis.text.x = element_text(angle = 90, hjust = 1)) + 
  labs(x = "Intervention class", y = "Effect size")

robu_int_class

```

```{r}

test_pairwise(robu_int_class, 1:9, bonf = FALSE) %>%
  mutate(
    p_adj = format.pval(pmin(1, p * 36), nsmall = 4, digits = 4),
    diff = format(diff, nsmall = 2, digits = 2),
    t = format(t, nsmall = 2, digits = 2),
    df = format(df, nsmall = 1, digits = 1),
    p = format.pval(p, nsmall = 4, digits = 4)
  ) %>%
  rename(`B - A` = diff, `t stat.` = t, `d.f.` = df, `p-value` = p, `adjusted p-value` = p_adj) %>%
  pander(
    missing = "", 
    split.table = 150, 
    justify = "llrrrrr"
  )

```


## Intervention target

```{r}

ggplot(dat, aes(intervention_target, g, size = 1 / SE, color = studyID)) + 
  geom_point() + 
  theme_minimal() + 
  theme(legend.position = "none", axis.text.x = element_text(angle = 90, hjust = 1)) + 
  labs(x = "Intervention target", y = "Effect size")

robu_target

```

## Skills 

```{r}

ggplot(dat, aes(skills, g, size = 1 / SE, color = studyID)) + 
  geom_point() + 
  theme_minimal() + 
  theme(legend.position = "none", axis.text.x = element_text(angle = 90, hjust = 1)) + 
  labs(x = "Skills interventions", y = "Effect size")

robu_skills

```

## Outcome group

```{r}

ggplot(dat, aes(outcome_group, g, size = 1 / SE, color = studyID)) + 
  geom_point() + 
  theme_minimal() + 
  theme(legend.position = "none") + 
  labs(x = "Outcome group", y = "Effect size")

robu_outcome_group
```

# Joint models for moderators

## Gender and cancer type

The gender and cancer type moderators are partially confounded because all studies of breast cancer patients were conducted with  samples of all female participants. The following panel displays the results of a joint meta-regression that includes both gender and cancer type as predictors. Neither predictor is statistically distinguishable from zero in the joint model.

```{r}
dat %>%
  mutate(type = factor(type, levels = c("other","breast"))) %>%
  filter(!is.na(type)) %>%
  robu_fit(dat = ., "g ~ type + female_pct_cent", studyID = "studyID", V = "Vg")
```

## intervention format and focus

Tom also suggested examining the interaction of intervention format (i.e., in-person versus other formats) and intervention focus (individual, dyad, or group). The following plot depicts the distribution of effect size estimates by intervention focus and format (omitting three studies with missing values on either of these variables).

```{r}
intervention_dat <- 
  dat %>%
  filter(focus != "not reported", in_person != "not reported") %>%
  mutate(focus_format = paste(focus, in_person, sep = ", "))

ggplot(intervention_dat, aes(focus_format, g, size = 1 / SE, color = studyID)) + 
  geom_point() + 
  theme_minimal() + 
  theme(legend.position = "none", axis.text.x = element_text(angle = 90, hjust = 1)) + 
  labs(x = "Intervention focus + format", y = "Effect size")

```

The following table reports the average effect size estimates for each combination of intervention format and focus. 

```{r}
intervention_summary <- 
  intervention_dat %>%
  group_by(in_person, focus) %>%
  summarise(
    studies = length(unique(studyID)),
    effects = n()
  ) %>%
  mutate(focus_format = paste(focus, in_person, sep = ", "))


robu_intervention <- robu_fit(dat = intervention_dat, "g ~ 0 + focus_format", 
                              studyID = "studyID", V = "Vg")

tidy(robu_intervention) %>%
  left_join(intervention_summary, by = c(Coef = "focus_format")) %>%
  arrange(in_person, focus) %>%
  select(format = in_person, focus, studies, effects, Est, SE, CI_L, CI_U) %>%
  as.data.frame() %>%
  pander(split.tables = 150, round = 3)
```

Among studies that used an in-person format, differences between studies using different intervention foci are not statistically distinguishable ($`r test_joint(robu_intervention, 1:3, print = TRUE)`$). Similarly,, among studies that use a different format, , differences between studies using different intervention foci are not statistically distinguishable ($`r test_joint(robu_intervention, 4:6, print = TRUE)`$).

## intervention format and class

We further considered the interaction of intervention format (i.e., in-person versus other formats) and intervention class. The following table reports the number of effects in each category of intervention class and format.

```{r}
intervention_dat <- 
  dat %>%
  filter(in_person != "not reported") %>%
  mutate(class_format = paste(intervention_class, in_person, sep = ", ")) %>%
  droplevels()

intervention_dat %>%
  group_by(intervention_class) %>%
  summarise(
    `n effects - other` = sum(in_person == "other"),
    `n effects - in-person` = sum(in_person == "in-person"),
    `in-person %` = round(100 * mean(in_person == "in-person"))
  ) %>%
  kable()

```


The following plot depicts the distribution of effect size estimates by intervention focus and class (omitting one study where format was not reported).

```{r, fig.width = 6, fig.height = 7}
class_format_labs <- rev(names(table(intervention_dat$class_format)))

ggplot(intervention_dat, aes(class_format, g, size = 1 / SE, color = in_person)) + 
  geom_point() + 
  theme_minimal() + 
  coord_flip() + 
  scale_x_discrete(limits = class_format_labs) + 
  theme(legend.position = "none") + 
  labs(x = "Intervention class + format", y = "Effect size")

```

The following table reports the average effect size estimates for each intervention class, controlling for intervention format. 

```{r}

robu_class_format <- robu_fit(dat = intervention_dat, "g ~ 0 + intervention_class + in_person", 
                              studyID = "studyID", V = "Vg")

robu_class_format
```

Controlling for intervention format, differences between intervention classes are not statistically distinguishable ($`r test_joint(robu_class_format, 1:9, print = TRUE, vcov = "CR2")`$).