. 2014 Sep 11;2014(9):CD002018. doi: 10.1002/14651858.CD002018.pub2

Yonkers 2008.

Methods	Randomisation method: pre‐determined with a computer‐generated schedule in blocked sets of 4 and was stratified by site Analysis by ITT: yes (LOCF for response and remission analyses)
Participants	Setting: community/secondary care. Women were recruited by advertisement or referral from obstetric care providers Country: USA Inclusion criteria: aged ≥ 16 years, met diagnostic criteria for MDD with an onset in the 3 months post‐delivery, had given birth within the previous 9 months and had a score on the 17‐item HAM‐D of at least 16 at the initial visit. Women who were breastfeeding were allowed to participate Exclusion criteria: onset of MDD prior to delivery, current suicidal ideation with intent, current (within the last 6 months) alcohol or drug abuse or dependence, current psychotic symptoms, lifetime diagnosis of schizophrenia, bipolar disorder or schizoaffective disorder, currently receiving treatment (pharmacotherapy or psychotherapy) for a psychiatric disorder, currently pregnant, unwilling to be randomised or unable to attend treatment visits at a participating site Number recruited: 70 women (35 active treatment, 35 placebo) Number dropped out by final week (week 8 ± 7 days): paroxetine group: 20/35 (57%); placebo group: 23/35 (66%) Number analysed: ITT analysis and evaluation at week 8 for results from 17 women in paroxetine group and 14 women in the placebo group Age (mean ± SD): paroxetine: mean 26.1 ± 6.5; placebo: 25.9 ± 6.5 Ethnicity: paroxetine: white: 18 (51.4%), black: 5 (14.3%), Hispanic: 11 (31.4%), other 1 (2.9%); placebo: white: 16 (45.7%), black: 4 (11.4%), Hispanic: 14 (40.0%), other 1 (2.9%) Socioeconomic status: paroxetine: < 12 years of education: 11 (37.9%), > 12 years of education: 18 (62.1%); placebo: < 12 years of education: 15 (53.6%), > 12 years of education: 13 (46.4%)
Interventions	Women were randomly assigned to 1 of 2 groups: Paroxetine: week 1 and 2: 1 capsule (10 mg) of immediate release paroxetine daily; week 3 and 4: 2 capsules (20 mg) of immediate release paroxetine daily unless side effects limited an increase. Further increments to 30 mg by week 4 and then 40 mg by week 6 were encouraged if improvement was assessed as < 30% compared with baseline Placebo: identical placebo administered according to same protocol as paroxetine
Outcomes	All primary outcomes listed were assessed at weeks 1, 2, 3, 4, 6 and for a final visit, at week 8 (± 7 days) Primary outcome: change in depressive symptoms measured by the HAM‐D, CGI and the Inventory of Depressive Symptomatology ‐ Self‐report scale Secondary outcomes: rates of remission, defined as a HAM‐D score of ≤ 8, and response, defined as a CGI‐Improvement scale score of 1 or 2; predictors of remission defined as above; Social Adjustment as measured by the SAS; SF‐36
Notes	This study was supported by a Collaborative Research Trial, Investigator‐Initiated grant from GlaxoSmithKline to Drs Yonkers and Cohen and by National Institute of Mental Health grant MH01648 to Dr Yonkers. See footnote for abbreviations and description of outcome measures.
*Risk of bias*
Bias	Authors' judgement	Support for judgement
Random sequence generation (selection bias)	Low risk	"Subjects were randomly assigned to take identical capsules of either paroxetine or placebo. Random assignment was predetermined with a computer‐generated schedule in blocked sets of 4 and was stratified by site. A study statistician was responsible for random assignment"
Allocation concealment (selection bias)	Unclear risk	Insufficient details provided to be sure of allocation concealment
Blinding (performance bias and detection bias)   of participants	Low risk	"Subjects were instructed to take 1 capsule (10mg of immediate‐release paroxetine or identical placebo)"
Blinding (performance bias and detection bias)   of personnel	Low risk	"A study statistician was responsible for random assignment, and remaining study staff were blind to group assignment."
Blinding (performance bias and detection bias)   of outcome assessors	Low risk	"..remaining study staff were blind to group assignment"
Incomplete outcome data (attrition bias)   All outcomes	High risk	"Seventy women qualified for the study, and 31 completed study treatment… Subjects withdrew from the active treatment for the following reasons: 1 due to an adverse event (nausea), 6 due to lack of efficacy, including 1 subject who was psychiatrically hospitalised, 6 who were lost to follow‐up, 5 who felt well and no longer desired treatment, 1 who became pregnant and 1 who was noncompliant In subjects randomly assigned to placebo, 4 left the study because of perceived adverse events (rash, nausea, diarrhoea, headache), 7 discontinued because of lack of efficacy, including 1 subject who required hospitalisation, 9 were lost to follow‐up, 2 improved and no longer desired treatment, and 1 subject moved" "Given the high rate of dropout, we explored additional models to assess the robustness of remission results. These models first assumed that all dropouts were remitters and then that they were all nonremitters. In both models, treatment with paroxetine remained significantly better than treatment with placebo" Drop out numbers are similar in the 2 groups and some reasons account for similar numbers across the 2 groups but for a substantial proportion "lost to follow up" the reason for drop‐out is unknown. Sensitivity analyses only performed for the primary outcome
Selective reporting (reporting bias)	High risk	The Social Adjustment Scale and SF‐36 were included in the methods but not reported in the results
Other bias	Unclear risk	"Pill counts revealed that, among women assigned to paroxetine, 7 were noncompliant (took less than 80% of prescribed pills at 1 visit, and 4 were non‐compliant at 2 visits. One subject assigned to active treatment was discontinued due to on‐going lack of compliance; of the remaining subject, no others fell below the 80% compliance rate at more than 2 visits. Among subjects assigned to placebo, 10 were noncompliant at 1 visit, 3 were noncompliant during at least 2 visits, and 1 was noncompliant on 4 occasions" The potential bias was unclear as we do not know whether non‐compliant women were taking 0% or 79% of their medication. It is also not clear whether the numbers of non‐compliant participants were reported for the study as a whole (26/70 women) or only for those who did not drop out (26/31 women)

Abbreviations: BPD: brief dynamic psychotherapy; CBT: cognitive behavioural therapy; CGI: Clinical Global Improvement; CIS‐R: Revised Clinical Interview Schedule; DSM‐IV: Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition; EPDS: Edinburgh Postnatal Depression Scale; GAS: Global Assessment Scale; GHQ: General Health Questionnaire; GRIMS: Golombok Rust Inventory of Marital State; HAM‐A: Hamilton Rating Scale for Anxiety; HAM‐D: Hamilton Rating Scale for Depression; ICD‐10: International Classification of Disease Tenth Revision; ITT: intention to treat; LOCF: last observation carried forward; MADRS: Montgomery‐Åsberg Depression Rating Scale; MAMA: Maternal Adjustment and Maternal Attitudes; MDD: major depressive disorder; MIH: Mental Health Index; PAPA: Preschool Age Psychiatric Assessment; SAS: Social Adjustment Scale; SCID: Structured Clinical Interview for DSM‐IV; SD: standard deviation; SF‐12: 12‐item Short Form; SF‐36: 36‐item Short Form; SPQ: Social Problems Questionnaire; YBOCS: Yale‐Brown Obsessive Compulsive Scale.

CIS‐R is a structured diagnostic interview schedule for the diagnosis of common mental disorders. The CIS‐R is widely used in population and primary care surveys to provide estimates of depression.

CGI‐Improvement Scale is a clinician‐rated scale that assesses changes in symptoms. The scales are rated on a scale of 1 = very much improved; 2 = much improved; 3 = minimally improved; 4 = no change; 5 = minimally worse; 6 = much worse or 7 = very much worse. Each component of the CGI is rated separately and the scales do not yield a global score.

CGI‐Severity of Illness measure is a clinician‐rated scale that assess the severity of symptoms. The CGI‐Severity of Illness is rated on a scale of 1 = not at all ill; 2 = borderline mentally ill; 3 = mildly ill; 4 = moderately ill; 5 = markedly ill; 6 = severely ill or 7 = extremely ill. The CGI‐Improvement scale is a clinician‐rated scale that assesses changes in symptoms. The scales are rated on a scale of 1 = very much improved; 2 = much improved; 3 = minimally improved; 4 = no change; 5 = minimally worse; 6 = much worse or 7 = very much worse. Each component of the CGI is rated separately and the scales do not yield a global score.

EPDS is a 10‐item self administered screen for perinatal depression, validated in 20 languages. For each item, women are asked to select 1 of 4 responses that most closely describe how they have felt over the past 7 days. Each response has a value of 0‐3; scores for the 10 items are summed to give a total score between 0 and 30. The EPDS is the most widely used screening instrument for postpartum depression and has a positive predictive value for postnatal major depression of 9‐64% (with a cut‐off score of 9/10) or 17‐100% (with a cut‐off of 12/13). A cut‐off score of 12/13 is used in most studies to indicate postpartum depression. The EPDS does not discriminate levels of depression and additional information is required to meet diagnostic criteria for depression.

EQ‐5D is a preference‐based measure of health‐related quality of life measured on 5 dimensions (i.e. mobility, self care, usual activities, pain/discomfort and anxiety/depression), each rated on 3 levels (i.e. no problems, some problems and severe problems). Participants are classified into 1 of 243 health states, each associated with a score that can be used to calculate quality‐adjusted life years. The measure has been extensively used in health economic evaluations and its psychometric properties are adequate.

GAS is a rating scale for evaluating the overall functioning of a person during a specified time period on a continuum from psychological or psychiatric sickness to health.

GRIMS is a 28‐item self complete questionnaire that assesses the quality of the relationship between a married or co‐habitating couple.

HAM‐A is a clinician‐rated screening instrument that assesses the presence and severity of anxiety. Total scores are obtained by summing the score of each item, 0‐4 (symptom is absent, mild, moderate or severe). For the 14‐item HAM‐A version total scores range from 0 to 56. A score of 0‐13 is indicative of no anxiety; 14‐17 is indicative of mild anxiety; 18‐24 is indicative of moderate anxiety and 25‐30 is indicative of severe anxiety.

HAM‐D is a clinician rated screening instrument that assesses the presence and severity of depression. Total scores are obtained by summing the score of each item, 0‐4 ((symptom is absent, mild, moderate or severe) or 0‐2 (absent, slight or trivial, or clearly present). For the 17‐item HAM‐D version, total scores range from 0 to 54. A score of 0‐6 is indicative of no depression, 7‐17 is indicative of mild depression, 18‐24 is indicative of moderate depression and ≥ 25 is indicative of severe depression. For most raters, a total score of ≤ 7 after treatment is a typical indicator of remission and a decrease of 50% or more from baseline is considered an indicator of a clinically significant change.

MADRS is a diagnostic instrument that measures the severity of depressive episodes. Each response has a value of 0‐6; scores for the 10 items are summed to give a total score between 0 and 60. A score of 0‐6 is indicative of no depression, 7‐19 is indicative of mild depression; 20‐34 is indicative of moderate depression and ≥ 35 is indicative of severe depression.

MAMA is a self administered questionnaire that examines perceptions of maternal adjustment and attitudes towards marital relationships and the baby. The postnatal sub‐scale of the MAMA questionnaire comprises 12 items rated on a 4‐point scale from 1 = "not at all" to 4 = "very much".

SF‐12 is a 12‐item self‐complete questionnaire that measures functional health and well‐being. The measure is a widely used and well‐validated generic measure of functional quality of life.

SPQ is a 33‐item self report questionnaire that covers 10 areas or domains, including housing conditions; occupation; financial status; social and leisure activities; contacts with relatives, friends and neighbours; family functioning; child‐parent interaction; relationship with spouse or partner and legal matters. The individual items are rated on a 4‐point scale ranging from 0 (no social difficulties/satisfactory adjustment) to 3 (severe social difficulties/very poor adjustment).