Abstract
Background
The brain is a critical target organ for thyroid hormone, but it is unclear whether variations in thyroid function within and near the reference range affect quality of life, mood, or cognition.
Methods
A total of 138 subjects with levothyroxine (L-T4)-treated hypothyroidism and normal thyrotropin (TSH) levels underwent measures of quality of life (36-Item Short Form Health Survey, Underactive Thyroid-Dependent Quality of Life Questionnaire), mood (Profile of Mood States, Affective Lability Scale), and cognition (executive function, memory). They were then randomly assigned to receive an unchanged, higher, or lower L-T4 dose in double-blind fashion, targeting one of three TSH ranges (0.34 to 2.50, 2.51 to 5.60, or 5.61 to 12.0 mU/L). Doses were adjusted every 6 weeks based on TSH levels. Baseline measures were reassessed at 6 months.
Results
At the end of the study, by intention to treat, mean L-T4 doses were 1.50 ± 0.07, 1.32 ± 0.07, and 0.78 ± 0.08 μg/kg (P < 0.001), and mean TSH levels were 1.85 ± 0.25, 3.93 ± 0.38, and 9.49 ± 0.80 mU/L (P < 0.001), respectively, in the three arms. There were minor differences in a few outcomes between the three arms, which were no longer significant after correction for multiple comparisons. Subjects could not ascertain how their L-T4 doses had been adjusted (P = 0.55) but preferred L-T4 doses they perceived to be higher (P < 0.001).
Conclusions
Altering L-T4 doses in hypothyroid subjects to vary TSH levels in and near the reference range does not affect quality of life, mood, or cognition. L-T4-treated subjects prefer perceived higher L-T4 doses despite a lack of objective benefit. Adjusting L-T4 doses in hypothyroid patients based on symptoms in these areas may not result in significant clinical improvement.
L-T4 doses were adjusted in hypothyroid subjects to achieve low-normal, high-normal, or mildly elevated TSH levels. No effects were seen in quality of life, mood, or cognition after 6 months.
Overt hypothyroidism interferes with brain functions (1), but effects of variations in thyroid function within and near the reference range are less clear. Observational studies of this issue have been inconsistent, and a few randomized, blinded intervention studies have been negative (1–11). In the absence of consensus, many patients with mild thyrotropin (TSH) elevations are treated with levothyroxine (L-T4) to improve neurocognitive symptoms, and L-T4 doses are often increased to treat persistent symptoms.
We recruited hypothyroid subjects treated with L-T4 who underwent testing for health status, mood, and cognitive function. We targeted cognitive domains preferentially affected by mild thyroid dysfunction (memory and executive function) (1). We then adjusted subjects’ L-T4 doses in a blinded fashion over 6 months to achieve one of three TSH ranges (low-normal, high-normal, or mildly elevated), and repeated the tests. We hypothesized that altering TSH levels in these ranges would affect quality of life, mood, memory, and executive function.
Experimental Subjects
A total of 197 hypothyroid subjects receiving L-T4 monotherapy were recruited from the authors’ clinics, through review of electronic health records, and by flyers. All were diagnosed as adults and had past elevated TSH levels. L-T4 doses were stable for ≥3 months. None had acute or chronic illnesses or were on medications that affect thyroid hormone levels, mood, or cognition. Stable doses of oral contraceptive or estrogen therapy were allowed. Testing was done during the first 14 days after onset of menstrual bleeding or an oral contraceptive cycle in premenopausal women.
Materials and Methods
Experimental design
The protocol was approved by the Oregon Health & Science University (OHSU) Institutional Review Board. Subjects gave written informed consent.
Screening visit
Subjects were screened for general health, medicines, thyroid status, and mood or cognitive disorders by history, physical examination, and laboratory testing. General intelligence was estimated by the Wechsler Adult Intelligence Scale–Revised (WAIS-R) Vocabulary subtest (12).
Run-in visits
Subjects taking branded L-T4 with normal screening TSH levels proceeded directly to the baseline visit. Subjects who had abnormal screening TSH levels or were taking generic L-T4 were placed on branded L-T4 and underwent run-in visits every 6 weeks with L-T4 dose adjustments until doses were stable, with normal TSH levels for 3 months.
Baseline visit
Within 6 weeks of the screening or final run-in visit, subjects returned for a 4-hour baseline visit. Subjects refrained from taking their L-T4 dose that morning. Serum TSH, free thyroxine (fT4), and free triiodothyronine (fT3) levels were obtained. Subjects self-completed the following validated surveys:
Billewicz scale of hypothyroid-related symptoms (13)
Underactive Thyroid-Dependent Quality of Life Questionnaire, which measures the impact of hypothyroidism on quality of life (14)
36-Item Short Form Health Survey (SF-36), a general health questionnaire (15)
Profile of Mood States (POMS), a mood questionnaire (16)
Affective Lability Scale, where subjects rate the tendency of their moods to fluctuate (17)
Cognitive tests were administered by a single experienced research assistant.
Executive function
Attention and concentration
Letter Cancellation Test.
The subject was given a sheet of paper with 6 lines of 52 letters in random sequence and instructed to circle two specified target letters as quickly as possible. The score was the number of errors and time needed (18).
Cognitive flexibility
Trail Making Test.
The subject connected circles on a sheet of paper as quickly as possible. In Part A, the subject drew lines to connect numbered circles in ascending order. In Part B, the subject drew lines to connect circles in ascending order, alternating between numbers and letters. The score was the number of errors and time needed (19).
Decision making
Iowa Gambling Task.
Four decks of cards were shown face down on a computer screen. The subject chose cards from any deck, resulting in the gain or loss of money. The subject was unaware that two decks were advantageous (small gains, smaller losses), and two were disadvantageous (large gains, larger losses). The subject’s choices were classified as advantageous (X) or disadvantageous (Y), with a net score of X − Y, over 5 trials of 100 cards each (20).
Working memory
N-Back test.
A series of letters was presented one at a time on a computer screen. Subjects responded each time a letter appeared that they had seen on the previous screen (1-back). The task was repeated with intervening letters imposed while the subjects had to hold in mind letters that had appeared 2-back and then 3-back. The score was the total number correct on target and the total number incorrect nontarget for each condition (21).
Subject-Ordered Pointing.
Subjects viewed a series of computer screens that presented abstract drawings (6, 8, 10, or 12 per screen). Each screen in a set showed the same array of drawings but in a different spatial arrangement. The subject indicated one drawing per screen, avoiding the same drawing on subsequent screens. Subjects erred when they chose a drawing that had been previously chosen. Each set was repeated three times. The score was the total number of errors across each screen set (22).
Declarative memory
Paragraph Recall (verbal memory)
Subjects were read a brief story and verbally recalled it immediately and after 30 minutes. The score was the total number of story elements recalled at each interval (23).
Motor learning
Pursuit Rotor
Subjects held a photosensitive wand to maintain contact with a 2-cm light disk rotating on a turntable (Lafayette Instrument Company, Lafayette, IN). Two blocks of eight 20-second trials were administered, with a 20-second rest after each trial and a 60-second rest period after four trials. After a 30-minute interval, the two blocks were repeated. The score was the mean total time the stylus remained on target (24).
Motor Sequence Learning Test
The subject memorized two keypress sequences, each associated with a letter of the alphabet. As soon as that letter appeared on the computer screen, the subject performed the appropriate sequence as quickly as possible. Subjects performed 10 blocks of 18 trials each. The score was the total movement time (time from character presentation to completion of the sequence) (25).
Randomization
Immediately after the baseline visit, subjects were randomly assigned to one of three arms: low-normal TSH (0.34 to 2.50 mU/L), high-normal TSH (2.51 to 5.60 mU/L), or mildly elevated TSH (5.60 to 12.0 mU/L). These were based on the OHSU TSH assay reference range, recent debate over restricting the upper limit to 2.50 mU/L (26), and our intention to restrict elevated TSH levels to the subclinical hypothyroid range. Randomization was stratified by whether the subject’s TSH was low- or high-normal.
L-T4 dosing
Taking into account baseline TSH levels, the dispensing physician (K.G.S.) initially determined whether subjects should continue their usual L-T4 dose or receive a different dose to achieve the assigned target TSH ranges. If a different dose was indicated, the subject’s usual dose was altered by 25 to 50 μg, depending on the difference between the initial and target TSH levels. The principal investigator (M.H.S.), research assistants, and subjects were unaware of the treatment assignment or L-T4 doses. The OHSU research pharmacy dispensed 6-week supplies of L-T4 pills in opaque gel capsules to maintain blinding.
Interim visits
At 6, 12, and 18 weeks, subjects returned for brief visits. The principal investigator assessed clinical effects and determined whether the subject could comfortably continue the study. TSH levels from these visits were reviewed by K.G.S., who adjusted L-T4 doses if the interim TSH level was not in the target range. L-T4 doses were adjusted by 12.5 to 50 μg depending on the difference between the interim and target TSH levels, and the research pharmacy dispensed new 6-week supplies. Additional interim visits were allowed if the TSH level was not in the target range at 18 weeks. Once the TSH was in the target range, no further interim visits were scheduled, and the subject proceeded to the end-of-study visit.
End-of-study visit
Approximately 6 weeks after the final interim visit, baseline measurements were repeated. At this visit, TSH, fT4, and fT3 levels were measured, and this TSH level was subsequently used to assign subjects to actual end-of-study TSH arms for the purposes of data analysis. The subject was then placed back on his or her usual L-T4 dose, or a dose that led to better TSH control during the study, per subject preference.
Analytic methods
TSH was measured by immunochemiluminometric assay (Beckman Coulter): functional sensitivity 0.02 mU/L, normal range 0.34 to 5.60 mU/L, and interassay coefficient of variation (CV) 5% at 0.70 mU/L. fT4 was measured by direct equilibrium dialysis (Quest Diagnostics): sensitivity 0.08 ng/dL, normal range 0.8 to 2.7 ng/dL, and interassay CV 6.8% at 0.3 ng/dL and 1.6% at 3.8 ng/dL. FT3 was measured by tracer dialysis (Quest Diagnostics): sensitivity 25 pg/dL, normal range 210 to 440 pg/dL, and interassay CV 4%. TSH levels were measured at the time of testing, with stable assay characteristics during the study. fT4 and fT3 levels were batched and analyzed at the end of the study. All samples were run in duplicate.
Statistical methods
Differences between arms for continuous measures were analyzed with multiple linear regression models adjusted for age, sex and estrogen status, years of education, WAIS-R score, baseline body mass index (BMI), change in BMI, baseline TSH (low- vs high-normal), standard deviation of TSH values at interim and last visits, baseline LT-4 dose (μg/kg), time on LT-4, time on LT-4 dose, and baseline value of the outcome variable. Binary measures were analyzed with multiple logistic regression models that adjusted for baseline TSH (low- vs high-normal) and baseline values of the outcomes because of the more limited nature of the binary data. For outcomes with significant differences between arms, the Tukey multiple comparison procedure was used to determine which arms were significantly different and adjust P values for pairwise differences between arms. Because not all outcomes were independent, multiple testing P value adjustments were made for groups of outcomes that included a significant outcome. These outcomes included the SF-36 summary and subscales (10 outcomes total), the POMS subscales (6 outcomes total), N-Back (correct and incorrect variables for 1-back, 2-back, 3-back; 6 outcomes total), and Letter Cancellation Test (time and error, 2 outcomes total). Analyses were conducted as intention-to-treat and by the actual TSH arms subjects achieved at the end of the study. We also examined relationships between outcomes and TSH, fT4, or fT3 by using the same regression analyses but substituting, in separate models, the selected hormone for the categorical arms variable. All analyses were conducted in R version 3.3.2 (R Foundation for Statistical Computing) (27).
Results
Demographic, clinical, and thyroid function parameters
Figure 1 provides a flowchart of the study design and subject enrollment. Of the 197 subjects initially screened, 24 were excluded because of abnormal laboratory tests [low-density lipoproteins >160 mg/dL (n = 11), glucose >120 mg/dL (n = 2), elevated serum calcium (n = 1)], abnormal electrocardiogram (n = 3), TSH out of range (n = 5), or medical issues (n = 2). Fifty subjects were taking branded L-T4 and had normal screening TSH levels, and they proceeded directly to the baseline visit. One hundred twenty-three subjects were taking generic L-T4 or had abnormal screening TSH levels and proceeded to the run-in. One hundred fifty-one subjects completed the baseline visit. Twenty-two subjects withdrew during the run-in [personal issues (n = 17), started other medications (n = 2), started weight loss diet (n = 1), medical issues (n = 2)]. Thirteen subjects withdrew before the final visit [personal issues (n = 5), medical issues (n = 6), pregnancy (n = 1), started weight loss diet (n = 1)]. Seven of these withdrew before the 6-week interim visit, 4 before the 12-week interim visit, and 2 before the 18-week interim visit. Subjects who were excluded or withdrew were not different from the study population in demographic or clinical attributes.
Figure 1.
Flowchart of study design and enrollment.
A total of 138 subjects completed the study (125 female, 13 male). They were aged 27 to 70 years and were receiving L-T4 for primary hypothyroidism (n = 112), hypothyroidism after iodine-131 therapy for Graves disease (n = 17), postpartum thyroiditis leading to permanent hypothyroidism (n = 3), or thyroid surgery (n = 6). They had received L-T4 for 5 months to 50 years (mean 12 years). Mean time on current L-T4 dose was 1.6 years. Baseline data from these subjects have been published (28).
During the run-in, 92 subjects (67%) switched from generic to branded L-T4, and 36 (26%) needed L-T4 dose adjustment. Percentages of subjects needing a run-in or dose adjustment did not differ between the three arms (P = 0.50). At baseline, 87 subjects (63%) had low-normal TSH and 51 (37%) had high-normal TSH levels. Nineteen subjects (14%) did not need L-T4 dose adjustments at interim visits, and 119 (86%) needed 1 to 5 additional dose adjustments (mean 2.1). Forty-five subjects (33%) did not achieve their intended target TSH ranges (17%, 64%, and 16% in the low-normal, high-normal, and mildly elevated TSH arms, respectively). It was particularly difficult to maintain subjects in the high-normal TSH arm, because small changes in TSH levels near the lower or upper cutoffs of this arm moved subjects into one of the other two arms. For this reason, we conducted two separate analyses, one as intention-to-treat by randomized arm and one based on actual TSH levels at the end-of-study visit. Results are presented for the intention to treat analysis first, followed by the actual end arm analysis.
By intention to treat, subjects in the three arms did not differ in age, WAIS-R score, years in school, sex, estrogen status, ethnicity, BMI, or duration at current L-T4 dose (Table 1). Duration of L-T4 treatment was longer in the high-normal TSH arm (P < 0.001). Mean L-T4 doses at the end of the study were progressively lower in the three arms (1.50 ± 0.07, 1.32 ± 0.07, and 0.78 ± 0.08 μg/kg/day, respectively, P < 0.001), whereas mean TSH levels were progressively higher (1.85 ± 0.25, 3.93 ± 0.38, and 9.49 ± 0.80 mU/L, P < 0.001). Mean fT4 levels were lower in the mildly elevated TSH arm (1.79 ± 0.06, 1.64 ± 0.07, and 1.34 ± 0.05 ng/dL, respectively, P < 0.001), whereas mean fT3 levels were not significantly different between the three arms (201.4 ± 6.0, 191.4 ± 6.2, and 184.1 ± 6.6 pg/dL, P = 0.15). Seventy-two subjects (52%) had low baseline fT3 levels (118 to 209 pg/dL). At the end of the study, 28 subjects in the low-normal TSH arm (61%), 34 in the high-normal TSH arm (72%), and 34 in the mildly elevated TSH arm (76%) had low fT3 levels (82 to 209 pg/dL).
Table 1.
Clinical Parameters and Thyroid Function Tests at Baseline and End of Study, Analyzed as Intention to Treat
| End of Study | |||||
|---|---|---|---|---|---|
| Baseline | Arm 1 Low Normal TSH | Arm 2 High Normal TSH | Arm 3 Mildly Elevated TSH | P | |
| No. of subjects | 138 | 46 | 47 | 45 | |
| Age, y | 49.2 ± 1 | 49.5 ± 1.7 | 50.9 ± 1.8 | 49.3 ± 1.6 | 0.77 |
| WAIS-R scorea | 10.9 ± 0.2 | 10.8 ± 0.3 | 11.0 ± 0.3 | 10.8 ± 0.3 | 0.88 |
| Years in schoola | 15.9 ± 0.3 | 15.6 ± 0.5 | 16.1 ± 0.4 | 15.9 ± 0.4 | 0.74 |
| Sexa | 91% Female | 41 (89.1%) | 41 (87.2%) | 43 (95.6%) | 0.45 |
| 9% Male | 5 (10.9%) | 6 (12.8%) | 2 (4.4%) | ||
| Estrogen statusa | 9% Male | 5 (10.9%) | 6 (12.8%) | 2 (4.4%) | 0.50 |
| 39% Prenone | 19 (41.3%) | 15 (31.9%) | 20 (44.4%) | ||
| 9% Preon | 4 (8.7%) | 5 (10.6%) | 4 (8.9%) | ||
| 38% Postnone | 18 (39.1%) | 19 (40.4%) | 15 (33.3%) | ||
| 4% Poston | 0 (0%) | 2 (4.3%) | 4 (8.9%) | ||
| Ethnicitya | 92% White | 44 (95.7%) | 41 (87.2%) | 42 (93.3%) | 0.35 |
| 8% Other | 2 (4.3%) | 6 (12.8%) | 3 (6.7%) | ||
| BMI, kg/m2 | 27.8 ± 0.5 | 28.6 ± 0.8 | 27.3 ± 0.8 | 27.6 ± 1.0 | 0.58 |
| L-T4 duration of treatment, ya | 11.9 ± 0.8 | 9.9 ± 1.3 | 15.9 ± 1.7 | 9.7 ± 0.9 | <0.001b,c |
| L-T4 duration at current dose, ya | 1.63 ± 0.19 | 1.75 ± 0.35 | 1.50 ± 0.18 | 1.63 ± 0.40 | 0.86 |
| L-T4 dose, μg/kg | 1.44 ± 0.04 | 1.50 ± 0.07 | 1.32 ± 0.07 | 0.78 ± 0.08 | <0.001c,d |
| L-T4 dose change, μg/kg | 0.14 ± 0.02 | −0.21 ± 0.03 | −0.64 ± 0.05 | <0.001b,c,d | |
| TSH, mu/L | 2.21 ± 0.13 | 1.85 ± 0.25 | 3.93 ± 0.38 | 9.49 ± 0.80 | <0.001b,c,d |
| TSH change, mu/L | −0.18 ± 0.33 | 1.60 ± 0.42 | 7.23 ± 0.86 | <0.001c,d | |
| Free T4, ng/dL | 1.67 ± 0.03 | 1.79 ± 0.06 | 1.64 ± 0.07 | 1.34 ± 0.05 | <0.001c,d |
| Free T4, ng/dL change | 0.13 ± 0.07 | −0.04 ± 0.07 | −0.32 ± 0.06 | <0.001c,d | |
| Free T3, pg/dL | 214 ± 4.2 | 201.4 ± 6.0 | 191.1 ± 6.2 | 184.1 ± 6.6 | 0.15 |
| Free T3, pg/dL change | −18.9 ± 9.3 | −14.4 ± 6.9 | −32.4 ± 7.9 | 0.27 | |
Values are mean ± standard error of the mean. Change variables represent the differences between end of study and baseline for each arm. Differences between arms were tested with analysis of variance, and follow-up post hoc Tukey multiple comparisons were used to determine which arms were significantly different at the 5% level.
Abbreviations: Postnone, postmenopausal, no hormone treatment; Poston, postmenopausal on hormone treatment; Prenone, premenopausal, no hormone treatment; Preon, premenopausal on hormone treatment.
Values are at baseline for each arm.
Arm 1 vs Arm 2.
Arm 2 vs Arm 3.
Arm 1 vs Arm 3.
Health status and mood by intention to treat
At the end of the study, SF-36 Physical Functioning and POMS anger subscales were higher in the high-normal TSH compared with the low-normal TSH arm (49% vs 26% high, P = 0.03; 4.9 ± 0.7 vs 3.6 ± 0.6, P = 0.03), but these differences were not significant after correction for multiple testing. There were no other differences between the three arms in health status or mood measures (Table 2). Analyzing TSH, fT4, and fT3 as continuous variables, the SF-36 Mental Health subscale decreased by 0.33 point for each 1-mU/L increase in TSH (P = 0.05). There were no significant correlations between TSH, fT4, or fT3 and other health status or mood measures (Table 3).
Table 2.
End-of-Study Health Status and Mood Measures for Each Arm, Analyzed by Intention to Treat
| Measure | Arm 1 Low-Normal TSH | Arm 2 High-Normal TSH | Arm 3 Mildly Elevated TSH | P |
|---|---|---|---|---|
| Billewicz | ||||
| Billewicz Score | 2.8 ± 0.4 | 4.0 ± 0.5 | 4.1 ± 0.4 | 0.38 |
| Thyroid Disease Questionnaire | ||||
| Thyroid Disease Questionnaire weighted average | −1.6 ± 0.2 | −1.4 ± 0.2 | −1.9 ± 0.3 | 0.53 |
| SF-36 | ||||
| Mental component summary | 41.0 ± 0.9 | 39.8 ± 1.1 | 39.4 ± 1.0 | 0.66 |
| Physical component summary | 48.5 ± 0.7 | 50.6 ± 0.7 | 49.2 ± 0.8 | 0.28 |
| General health | 62.2 ± 1.6 | 64.3 ± 1.7 | 62.6 ± 2.1 | 0.86 |
| Mental health | 68.6 ± 1.6 | 66.0 ± 1.7 | 65.1 ± 1.7 | 0.12 |
| Vitality | 45.6 ± 2.7 | 50.5 ± 2.5 | 44.6 ± 2.7 | 0.36 |
| Bodily pain (BP)a | 21% high | 23% high | 33% high | 0.69 |
| Physical functioning (PF)a | 26% high | 49% high | 31% high | 0.03 b |
| Role physical (RP)a | 62% high | 79% high | 55% high | 0.08 |
| Social functioning (SF)a | 62% high | 53% high | 52% high | 0.62 |
| Role emotional (RE)a | 74% high | 74% high | 62% high | 0.43 |
| POMSc | ||||
| Anger | 3.6 ± 0.6 | 4.9 ± 0.7 | 4.3 ± 0.6 | 0.03 b |
| Confusion | 6.1 ± 0.5 | 6.6 ± 0.5 | 6.0 ± 0.4 | 0.87 |
| Depression | 4.4 ± 0.9 | 5.2 ± 0.9 | 5.4 ± 0.8 | 0.83 |
| Fatigue | 7.4 ± 0.8 | 6.7 ± 0.9 | 6.9 ± 0.8 | 0.42 |
| Tension | 5.2 ± 0.5 | 6.7 ± 0.5 | 7.2 ± 0.7 | 0.22 |
| Vigor | 14.9 ± 0.9 | 16.8 ± 0.9 | 14.8 ± 0.9 | 0.83 |
| Affective Lability Score | ||||
| Bipolar | 0.6 ± 0.1 | 0.7 ± 0.1 | 0.7 ± 0.1 | 0.42 |
| Depression | 0.9 ± 0.1 | 0.9 ± 0.1 | 1.0 ± 0.1 | 0.69 |
| Elation | 0.7 ± 0.1 | 0.8 ± 0.1 | 0.8 ± 0.1 | 0.89 |
| Angerd | 61% score >0 | 57% score >0 | 62% score >0 | 0.67 |
| Anxietyd | 67% score >0 | 83% score >0 | 84% score >0 | 0.21 |
| Anxiety Depressiond | 63% score >0 | 66% score >0 | 76% score >0 | 0.58 |
Values are mean ± standard error of the mean. Significant differences between arms are shown in bold. P values for continuous outcomes were adjusted for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, baseline TSH (low-normal vs high-normal), standard deviation of TSH values at all but first visit, baseline LT-4 dose, time on LT-4, time on LT-4 dose, and baseline value of outcome. P values for binary outcomes were adjusted for baseline TSH (low-normal vs high-normal) and baseline value of outcome. SF-36 and POMS outcomes were grouped, and multiple testing P value adjustments were made for these outcomes.
For BP, PF, RP, SF, and RE, the distributions within each group were highly skewed. We used the highest observed values of the skewed SF-36 subscales as the cutoff points for producing a dichotomous measure. The highest observed values were BP 75, PF 100, RP 50, SF 80, and RE 50.
Arms 1 and 2 were significantly different at the 5% level based on follow-up pairwise comparisons of all three arms based on Tukey multiple comparisons. However, these P values were not significant when we grouped outcomes from the same test together and applied multiple testing adjustments by using a Bonferroni correction to all individual Tukey adjusted P values comparing the three arms.
POMS values (except for the vigor subscale) were natural log-transformed before analysis because the raw data were skewed. All values were increased by 1 before the transformation because of the presence of zeros.
These scores were compared as the proportion positive between the groups, because the measures were skewed and contained a large proportion of zeros.
Table 3.
Correlations Between Changes in Thyroid Hormone Levels and Health Status and Mood Measures at End of Study
| fT4 | fT3 | TSH | ||||
|---|---|---|---|---|---|---|
| Measure | Coefficient | P | Coefficient | P | Coefficient | P |
| Billewicz | ||||||
| Billewicz Score | −0.13 (−1.2 to 0.95) | 0.82 | 0.07 (−0.04 to 0.18) | 0.20 | 0.04 (−0.07 to 0.14) | 0.49 |
| Thyroid Disease Questionnaire | ||||||
| Thyroid Disease Questionnaire weighted average | 0.18 (−0.26 to 0.62) | 0.42 | 0.02 (−0.03 to 0.06) | 0.41 | −0.04 (−0.08 to 0.00) | 0.07 |
| SF-36 | ||||||
| Mental component summary | 0.22 (−1.96 to 2.41) | 0.84 | 0.11 (−0.13 to 0.34) | 0.37 | −0.05 (−0.27 to 0.17) | 0.66 |
| Physical component summary | 0.40 (−1.27 to 2.06) | 0.64 | 0.03 (−0.15 to 0.20) | 0.75 | −0.01 (−0.18 to 0.15) | 0.89 |
| General health | 0.24 (−3.35 to 3.84) | 0.89 | 0.09 (−0.29 to 0.47) | 0.65 | 0.04 (−0.32 to 0.41) | 0.82 |
| Mental health | 1.97 (−1.36 to 5.30) | 0.24 | 0.19 (−0.17 to 0.54) | 0.30 | −0.33 (−0.66 to 0.00) | 0.05 |
| Vitality | 1.67 (−3.30 to 6.63) | 0.51 | 0.20 (−0.34 to 0.74) | 0.46 | −0.14 (−0.65 to 0.36) | 0.57 |
| Bodily pain (BP)a | −2% (−64% to 161%) | 0.96 | −8% (−20% to 3%) | 0.17 | 3% (−6% to 13%) | 0.53 |
| Physical functioning (PF)a | 72% (−35% to 378%) | 0.28 | 3% (−7% to 16%) | 0.54 | −4% (−13% to 5%) | 0.40 |
| Role physical (RP)a | 35% (−43% to 239%) | 0.50 | 3% (−7% to 14%) | 0.57 | −2% (−10% to 6%) | 0.61 |
| Social functioning (SF)a | 7% (−59% to 183%) | 0.90 | 7% (−4% to 19%) | 0.24 | 0% (−8% to 9%) | 0.96 |
| Role emotional (RE)a | 93% (−26% to 451%) | 0.19 | 7% (−3% to 19%) | 0.20 | −5% (−12% to 4%) | 0.27 |
| POMSb | ||||||
| Anger | −0.25 (−0.55 to 0.06) | 0.11 | −0.0006 (−0.0319 to 0.0306) | 0.97 | 0.02 (−0.01 to 0.05) | 0.21 |
| Confusion | −0.04 (−0.20 to 0.13) | 0.64 | 0.001 (−0.016 to 0.018) | 0.89 | 0.004 (−0.012 to 0.021) | 0.58 |
| Depression | −0.17 (−0.50 to 0.17) | 0.32 | −0.01 (−0.04 to 0.03) | 0.72 | 0.01 (−0.02 to 0.04) | 0.51 |
| Fatigue | −0.002 (−0.286 to 0.282) | 0.99 | −0.003 (−0.032 to 0.026) | 0.84 | 0.002 (−0.025 to 0.03) | 0.86 |
| Tension | −0.14 (−0.31 to 0.04) | 0.13 | −0.01 (−0.03 to 0.01) | 0.26 | 0.005 (−0.012 to 0.022) | 0.58 |
| Vigor | −0.54 (−2.54 to 1.45) | 0.59 | −0.09 (−0.29 to 0.11) | 0.37 | 0.03 (−0.16 to 0.23) | 0.73 |
| Affective Lability Score | ||||||
| Bipolar | 0.02 (−0.15 to 0.18) | 0.85 | −0.002 (−0.018 to 0.014) | 0.80 | −0.002 (−0.018 to 0.014) | 0.79 |
| Depression | −0.05 (−0.22 to 0.12) | 0.60 | −0.01 (−0.02 to 0.01) | 0.48 | 0.003 (−0.013 to 0.020) | 0.71 |
| Elation | −0.02 (−0.18 to 0.14) | 0.80 | −0.005 (−0.021 to 0.011) | 0.56 | 0.0004 (−0.0153 to 0.016) | 0.96 |
| Angerc | 29% (−48% to 233%) | 0.59 | −1% (−10% to 9%) | 0.83 | 4% (−4% to 13%) | 0.41 |
| Anxietyc | 2% (−67% to 231%) | 0.97 | −4% (−15% to 8%) | 0.49 | 5% (−5% to 16%) | 0.37 |
| Anxiety Depressionc | −33% (−80% to 111%) | 0.51 | 2% (−10% to 15%) | 0.81 | 0% (−9% to 11%) | >0.99 |
Correlations [95% confidence intervals (CIs)] with continuous outcomes were modeled with multiple linear regressions adjusting for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, standard deviation of TSH values at all but first visit, baseline LT-4 dose, time on LT-4, time on LT-4 dose, baseline hormone value, and baseline value of outcome. The magnitude of the coefficient indicates the estimated change in the outcome for each 1-unit increase in the study change (end of study – baseline) of fT4 or TSH and a 10-unit increase in the study change of fT3. Correlations (95% CIs) with binary outcomes were modeled with multiple logistic regressions adjusting for the baseline value of the hormone and the outcome of interest. Coefficients were transformed to estimate the percentage change in the predicted odds of the measure for a 1-unit increase in the study change (end of study – baseline) of fT4 or TSH and a 10-unit increase in the study change of fT3. The transformed coefficients are estimates of the risk ratios associated with the 1- or 10-unit increase in the study change of respective hormone level. A positive coefficient indicates that the measure increased with increasing hormone levels, whereas a negative coefficient indicates that the measure decreased with increasing hormone levels. Separate models were run for each hormone. Significant coefficients are shown in bold with corresponding P values.
For these variables, the distributions within each group were highly skewed. We used the highest observed values of the skewed SF-36 subscales as the cutoff points for producing a dichotomous measure. The highest observed values were BP 75, PF 100, RP 50, SF-80, and RE 50.
POMS values (except for the vigor subscale) were natural log-transformed before analysis because the raw data were skewed. All values were increased by 1 before the transformation because of the presence of zeros. The magnitude of the coefficient indicates the estimated change in the natural log of the measure plus 1 with a 1-unit (10 units for fT3) increase.
These scores were compared as the proportion positive between the groups, because the measures were skewed and contained a large proportion of zeros.
Cognitive tests by intention to treat
At the end of the study, the Letter Cancellation Test percentage with no errors and 1-back number correct on target were worse in the mildly elevated TSH compared with the low-normal TSH arm (11% vs 30%, P = 0.02; 68% vs 84%, P = 0.02), but these differences were not significant after correction for multiple testing. There were no other differences between the three arms in cognitive outcomes (Table 4). With TSH, fT4, and fT3 analyzed as continuous variables, there were a few correlations between fT4 or fT3 and individual outcomes, but only one remained significant after correction for multiple comparisons (Pursuit Rotor Trial 3 time on target inversely related to fT4 levels) (Table 5).
Table 4.
End-of-Study Cognitive Measures for Each Arm Analyzed by Intention to Treat
| Test | Arm 1 Low-Normal TSH | Arm 2 High-Normal TSH | Arm 3 Mildly Elevated TSH | P | |
|---|---|---|---|---|---|
| Executive function | Letter Cancellation Test | ||||
| Time, s | 102.0 ± 3.3 | 102.6 ± 3.1 | 100.4 ± 3.0 | 0.64 | |
| % With no errors | 30% | 26% | 11% | 0.02 a | |
| Trail Making Test | |||||
| Time, (s | 23.3 ± 1.0 | 23.2 ± 1.0 | 21.9 ± 0.8 | 0.26 | |
| ABC time, s | 56.7 ± 2.8 | 59.9 ± 3.1 | 54.9 ± 2.9 | 0.70 | |
| % With errors | 11% | 9% | 11% | 0.91 | |
| % With ABC errors | 28% | 26% | 22% | 0.76 | |
| Iowa Gambling Test | |||||
| Net-1 | 0.7 ± 1.6 | −0.7 ± 1.1 | −2.4 ± 1.1 | 0.48 | |
| Net-2 | 5.4 ± 1.3 | 4.9 ± 1.3 | 4.0 ± 1.3 | 0.73 | |
| Net-3 | 7.2 ± 1.6 | 8.2 ± 1.3 | 6.8 ± 1.5 | 0.60 | |
| Net-4 | 9.5 ± 1.6 | 7.5 ± 1.4 | 9.1 ± 1.5 | 0.58 | |
| Net-5 | 7.9 ± 1.6 | 7.5 ± 1.5 | 8.8 ± 1.6 | 0.84 | |
| Working memory | N-Back number correct on target | ||||
| 1-Backb | 84% | 75% | 68% | 0.02 a | |
| 2-Backb | 65% | 65% | 70% | 0.81 | |
| 3-Back | 11.7 ± 0.3 | 11.5 ± 0.4 | 11.5 ± 0.4 | 0.75 | |
| N-Back number incorrect nontarget | |||||
| 1-Backb | 14% | 18% | 20% | 0.43 | |
| 2-Backb | 23% | 25% | 34% | 0.31 | |
| 3-Back | 3.0 ± 0.2 | 3.5 ± 0.3 | 3.1 ± 0.3 | 0.38 | |
| Subject-Ordered Pointing errors | |||||
| 6 | 0.4 ± 0.1 | 0.5 ± 0.1 | 0.4 ± 0.1 | 0.68 | |
| 8 | 0.9 ± 0.1 | 0.8 ± 0.1 | 0.9 ± 0.1 | 0.73 | |
| 10 | 1.1 ± 0.1 | 1.1 ± 0.1 | 1.1 ± 0.1 | 0.60 | |
| 12 | 1.5 ± 0.2 | 1.5 ± 0.2 | 1.5 ± 0.2 | 0.80 | |
| Declarative memory | Paragraph Recall | ||||
| Immediate | 13.3 ± 0.4 | 13.5 ± 0.5 | 14.0 ± 0.4 | 0.70 | |
| 30-Min delay | 11.9 ± 0.4 | 12.1 ± 0.5 | 12.8 ± 0.4 | 0.68 | |
| Motor learning | Pursuit Rotor Trial | ||||
| Time on target, s | |||||
| 1 | 38.7 ± 2.1 | 34.8 ± 2.0 | 41.5 ± 2.2 | 0.51 | |
| 2 | 39.6 ± 2.3 | 37.0 ± 2.0 | 43.8 ± 2.1 | 0.58 | |
| 3 | 39.9 ± 2.3 | 37.4 ± 1.9 | 44.0 ± 2.2 | 0.44 | |
| 4 | 41.8 ± 2.3 | 39.4 ± 1.9 | 45.0 ± 2.2 | 0.68 | |
| Motor Sequence Learning Test | |||||
| Total movement time, s | 1114.4 ± 43.7 | 1125.3 ± 36.8 | 1019.6 ± 35.9 | 0.46 |
Values are mean ± standard error of the mean. Significant differences between arms are shown in bold. Individual tests are grouped by cognitive subdomains (first column). P values for continuous outcomes were adjusted for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, baseline TSH (low-normal vs high-normal), standard deviation of TSH values at interim and last visits, baseline LT-4 dose, time on LT-4, time on LT-4 dose, and baseline value of outcome. P values for binary outcomes were adjusted for baseline TSH (low-normal vs high-normal) and baseline value of outcome. Letter Cancellation and N-Back outcomes were grouped, and multiple testing P value adjustments were made for these outcomes.
Arms 1 and 3 were significantly different at the 5% level based on follow-up pairwise comparisons of all three arms based on Tukey multiple comparisons. However, these P values were not significant when we grouped outcomes from the same test together and applied multiple testing adjustments by using Bonferroni correction to all individual Tukey adjusted P values comparing the three arms.
These values were calculated as proportion of subjects for whom each measure was ≥15 (for correct on target) or >0 (for incorrect nontarget), because there were floor effects.
Table 5.
Correlations Between Changes in Thyroid Hormone Levels and Cognitive Measures at End of Study
| fT4 | fT3 | TSH | |||||
|---|---|---|---|---|---|---|---|
| Test | Coefficient | P | Coefficient | P | Coefficient | P | |
| Executive function | Letter Cancellation Test | ||||||
| Time, s | 1.85 (−5.37 to 9.07) | 0.61 | 0.60 (−0.14 to 1.33) | 0.11 | −0.09 (−0.80 to 0.62) | 0.80 | |
| % With no errors | 13% (−57% to 182%) | 0.80 | −3% (−13% to 7%) | 0.56 | −9% (−20% to 2%) | 0.13 | |
| Trail Making Test | |||||||
| Time, s | 1.04 (−1.04 to 3.13) | 0.32 | −0.10 (−0.31 to 0.11) | 0.34 | −0.03 (−0.24 to 0.17) | 0.75 | |
| ABC time, s | −4.14 (−10.33 to 2.05) | 0.19 | 0.12 (−0.52 to 0.75) | 0.72 | −0.04 (−0.65 to 0.57) | 0.89 | |
| % With errors | −23% (−82% to 177%) | 0.70 | −5% (−18% to 8%) | 0.45 | 3% (−10% to 14%) | 0.66 | |
| % With ABC errors | −11% (−66% to 121%) | 0.81 | −8% (−18% to 1%) | 0.10 | −5% (−14% to 4%) | 0.29 | |
| Iowa Gambling Test | |||||||
| Net-1 | −0.24 (−3.91 to 3.43) | 0.90 | −0.40 (−0.77 to −0.03) | 0.03 a | 0.09 (−0.27 to 0.46) | 0.61 | |
| Net-2 | −1.73 (−5.31 to 1.84) | 0.34 | −0.23 (−0.60 to 0.14) | 0.21 | −0.01 (−0.36 to 0.34) | 0.97 | |
| Net-3 | 0.49 (−3.09 to 4.08) | 0.79 | 0.09 (−0.28 to 0.45) | 0.65 | 0.10 (−0.25 to 0.45) | 0.58 | |
| Net-4 | −1.88 (−5.72 to 1.97) | 0.34 | -0.51 (−0.89 to −0.12) | 0.01 a | 0.001 (−0.374 to 0.377) | >0.99 | |
| Net-5 | −0.39 (−4.25 to 3.48) | 0.84 | −0.21 (−0.61 to 0.19) | 0.30 | 0.07 (−0.31 to 0.44) | 0.73 | |
| Working memory | N-Back number correct on target | ||||||
| 1-Backb | 252% (18% to 1118%) | 0.03 a | 9% (−2% to 23%) | 0.14 | −7% (−14% to 2%) | 0.11 | |
| 2-Backb | 1% (−60% to 159%) | >0.99 | 7% (−3% to 20%) | 0.20 | −2% (−10% to 8%) | 0.70 | |
| 3-Back | 0.17 (−0.76 to 1.10) | 0.71 | −0.04 (−0.14 to 0.05) | 0.36 | 0.05 (−0.04 to 0.14) | 0.31 | |
| N-Back number incorrect nontarget | |||||||
| 1-Backb | −22% (−77% to 136%) | 0.67 | 0% (−11% to 11%) | 0.95 | 3% (−7% to 14%) | 0.51 | |
| 2-Backb | 27% (−52% to 236%) | 0.63 | 3% (−8% to 14%) | 0.63 | 5% (−4% to 14%) | 0.29 | |
| 3-Back | −0.28 (−1.07 to 0.51) | 0.48 | 0.05 (−0.04 to 0.13) | 0.27 | −0.01 (−0.09 to 0.06) | 0.74 | |
| Subject-Ordered Pointing errors | |||||||
| 6 | 0.02 (−0.14 to 0.18) | 0.80 | −0.01 (−0.02 to 0.01) | 0.52 | 0.01 (−0.01 to 0.02) | 0.27 | |
| 8 | 0.11 (−0.16 to 0.38) | 0.42 | 0.0003 (−0.0280 to 0.0286) | 0.98 | 0.002 (−0.024 to 0.029) | 0.88 | |
| 10 | 0.10 (−0.14 to 0.35) | 0.42 | −0.003 (−0.029 to 0.023) | 0.82 | 0.02 (−0.01 to 0.04) | 0.18 | |
| 12 | 0.05 (−0.28 to 0.39) | 0.75 | −0.02 (−0.05 to 0.02) | 0.27 | −0.003 (−0.036 to 0.029) | 0.83 | |
| Declarative memory | Paragraph Recall | ||||||
| Immediate | −0.33 (−1.33 to 0.67) | 0.52 | −0.02 (−0.12 to 0.08) | 0.71 | 0.04 (−0.05 to 0.14) | 0.38 | |
| 30-min delay | −0.51 (−1.53 to 0.51) | 0.32 | 0.01 (−0.10 to 0.11) | 0.86 | 0.08 (−0.02 to 0.18) | 0.11 | |
| Motor learning | Pursuit Rotor Trial | ||||||
| Time on target, s | |||||||
| 1 | −5.13 (−9.77 to −0.48) | 0.03 a | 0.10 (−0.40 to 0.59) | 0.70 | 0.18 (−0.30 to 0.65) | 0.47 | |
| 2 | −5.78 (−10.62 to −0.93) | 0.02 a | 0.16 (−0.35 to 0.68) | 0.53 | 0.22 (−0.29 to 0.72) | 0.40 | |
| 3 | −6.70 (−11.25 to −2.14) | 0.004 c | 0.09 (−0.40 to 0.58) | 0.72 | 0.30 (−0.17 to 0.77) | 0.21 | |
| 4 | −2.19 (−7.09 to 2.71) | 0.38 | 0.10 (−0.41 to 0.61) | 0.70 | 0.15 (−0.34 to 0.65) | 0.54 | |
| Motor Sequence Learning Test | |||||||
| Total movement time, s | 31.00 (−14.82 to 76.82) | 0.18 | 2.31 (−2.45 to 7.07) | 0.34 | −1.08 (−5.74 to 3.58) | 0.65 | |
Correlations [95% confidence intervals (CIs)] with continuous outcomes were modeled with multiple linear regressions adjusting for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, standard deviation of TSH values at all but first visit, baseline LT-4 dose, time on LT-4, time on LT-4 dose, baseline hormone value, and baseline value of outcome. The magnitude of the coefficient indicates the estimated change in the outcome for each 1-unit increase in the study change (end of study – baseline) of fT4 or TSH and a 10-unit increase in the study change of fT3. Correlations (95% CIs) with binary outcomes were modeled with multiple logistic regressions adjusting for baseline values of the hormone and the outcome of interest. Coefficients were transformed to estimate the percentage change in the predicted odds of the measure for a 1-unit increase in the study change (end of study – baseline) of fT4 or TSH, and a 10-unit increase in the study change of fT3. The transformed coefficients are estimates of the risk ratios associated with the 1- or 10-unit increase in the study change of respective hormone level. A positive coefficient indicates that the measure increased with increasing hormone levels, whereas a negative coefficient indicates that the measure decreased with increasing hormone levels. Separate models were run for each hormone. Significant coefficients are shown in bold with corresponding unadjusted P values. For each set of related outcome measures, multiple testing adjustments were applied to all the individual P values from models adjusting for the same hormone type.
P was not significant at the 0.05 level when we applied multiple testing adjustments grouping outcomes from the same test together by Bonferroni correction.
These values were calculated as proportion of subjects for whom each measure was ≥15 (for correct on target) or >0 (for incorrect nontarget), because there were floor effects.
P was still significant at the 0.05 level when we applied multiple testing adjustments grouping outcomes from the same test together by Bonferroni correction.
Analyses by actual TSH arm at the end of the study
By the actual TSH arm at the end of the study, 57 subjects had TSH levels in the low-normal range, 28 in the high-normal range, and 53 in the mildly elevated range (Supplemental Table 1). Subjects did not differ in terms of any baseline demographic, clinical, or thyroid hormone variables. Mean L-T4 doses at the end of the study were progressively lower in the three arms (1.52 ± 0.06, 1.10 ± 0.10, and 0.92 ± 0.08 μg/kg/day, respectively, P < 0.001), whereas mean TSH levels were progressively higher (1.34 ± 0.08, 3.74 ± 0.12, and 9.74 ± 0.63 mU/L, P < 0.001). Mean fT4 and fT3 levels were lower in the mildly elevated TSH arm (1.89 ± 0.06, 1.44 ± 0.08, and 1.35 ± 0.04 ng/dL, respectively, P < 0.001; 206.2 ± 5.6, 196.0 ± 7.7, and 175.2 ± 5.4 pg/dL, P < 0.001). Thirty-three subjects in the low-normal TSH arm (72%), 19 in the high-normal TSH arm (40%), and 44 in the mildly elevated TSH arm (98%) had low fT3 levels (82 to 209 pg/dL).
By the actual TSH arm at the end of the study, the SF-36 Bodily Pain subscale was higher in the mildly elevated TSH arm than in the high-normal TSH arm (34% vs 11% high, P = 0.03), and the 1-back number correct on target was lower in the high-normal TSH arm than in the low-normal TSH arm (58% vs 86%, P = 0.002) (Supplemental Tables 2 and 3); neither was significant after correction for multiple testing. There were no other differences between the three arms in health status, mood, or cognitive measures.
Subjects’ perceptions of L-T4 doses
At the final study visit, subjects were asked whether they thought their L-T4 doses at the end of the study were higher, lower, or unchanged from the start of the study and which of the two doses they preferred. Subjects were not able to accurately ascertain changes in L-T4 doses (P = 0.54) (Supplemental Table 4). However, the majority preferred whichever L-T4 dose they thought was higher (P < 0.001): 68% preferred their dose at the end of the study when they thought their dose had been increased during the study, whereas 96% preferred their dose at the beginning of the study when they thought their dose had been lowered during the study.
Effect size calculations
We performed effect size calculations by using results for the SF-36 mental component summary and mental health scales, POMS depression scale, 3-back correct on target, and Iowa Gambling Task-5, outcomes affected by mild thyroid dysfunction in our previous studies (7, 29, 30). The necessary sample sizes to achieve 80% power at a 5% level of significance were 659 to 5442 subjects (28) (data not shown).
Discussion
In this cohort of L-T4 treated subjects, we found little evidence that altering L-T4 doses in a randomized, blinded fashion to achieve TSH levels in the low-normal, high-normal, or mildly elevated range affected health status, mood, or cognitive function over 6 months. After correction for multiple testing, no outcomes were significantly different when the data were analyzed by discrete groups, either as intention to treat or by actual TSH arm achieved at the end of the study. When the data were analyzed by continuous variables, the SF-36 mental health subscale was inversely correlated with TSH levels, but the magnitude of this correlation was small, and there were no other significant findings.
Most published studies of subclinical hypothyroid subjects are observational (1), and the most recent and largest failed to find significant quality of life, mood, or cognitive effects (31–36). These studies were often limited by the use of screening cognitive batteries, which are not designed to detect subtle defects in targeted cognitive domains likely to be affected by altered thyroid status. We used sensitive, specific cognitive measures to circumvent these limitations, based on human studies indicating that memory and executive function are preferentially affected, as well as animal studies of thyroid hormone and its receptor distribution in the brain (1).
Seven previous studies have assessed effects of L-T4 therapy on symptoms or neurocognitive outcomes in patients with subclinical hypothyroidism (3, 5, 6, 8–11). Three reported improvements in depression or memory after 6 months, but they were open-label (3, 6, 9), although one also reported a neural substrate for thyroid effects in the frontal cortex by functional magnetic resonance imaging (6). Four were blinded and found minor or no effects after 3 to 12 months (5, 8, 10, 11). Our study extends these findings with detailed measures of cognitive areas that have not been intensively studied. With the exception of Stott et al.’s (11) recent study, where the primary outcome was a tiredness score, ours is also the largest interventional study in subclinical hypothyroidism.
The literature regarding neurocognitive measures in euthyroid or L-T4-treated subjects has similar limitations. Most studies were observational, with only two small interventional trials of L-T4 therapy in subjects with normal TSH levels treated for 8 to 12 weeks. Neither found effects on hypothyroid symptoms, quality of life, psychological function, or limited measures of cognitive function (2, 4). Our results in a larger group of subjects treated for a longer time period extend these findings. Our findings do not support the idea that lowering TSH levels <2.50 mU/L (37) improves quality of life, mood, or cognitive function.
A major strength of our study was our focus on executive function. This cognitive domain has not been extensively studied in thyroid disease, because rodent models do not adequately represent executive functions in humans, and many laboratory measures of executive function are insensitive to real-world scenarios. We included the Iowa Gambling Task because this test of executive function assesses decision making under uncertainty and models real-world behavior (38). L-T4-treated patients often complain of problems in this area, but our results do not corroborate objective changes in executive function when L-T4 doses are altered.
Another major strength of our study was the blinded nature of our intervention. When we queried subjects, they could not accurately identify how their L-T4 doses had been altered, but the majority preferred whichever dose they perceived to be the higher dose, confirming an intrinsic bias toward higher L-T4 doses. Studies also indicate that self-knowledge of a thyroid disorder impairs psychological well-being regardless of the TSH level (30, 39), which would bias unblinded studies.
We found a high prevalence of low serum fT3 levels at baseline and at the end of the study in all three arms. However, fT3 levels did not correlate with our outcomes. Previous reports have also described a high prevalence of low T3 levels in L-T4-treated subjects (40). However, studies of liothyronine add-on or monotherapy in hypothyroidism have not shown improvements in quality of life, mood, or cognitive outcomes (40). Additional studies have suggested that polymorphisms in deiodinase or brain thyroid hormone transporter genes correlate with psychological scores and response to liothyronine, so subsets of L-T4-treated patients may respond to L-T3 (40).
Our study also has limitations. A major limitation was our sample size, and it is possible that we were underpowered to detect small effects. To address this problem, we performed an effect size calculation, which showed that large numbers of subjects (>600, depending on outcome) would need to be studied to reach statistical significance. The small magnitude of our effects suggests that clinically meaningful alterations are unlikely, but it remains possible that subtle effects were missed. We did not include an untreated euthyroid control group, so we cannot ascertain whether our subjects had decrements in quality of life, mood, or cognition at baseline compared with the general population. However, we previously published results of the same tests of quality of life, mood, and cognitive function in L-T4-treated subjects compared with euthyroid control subjects and found mild decrements in the SF-36 (mental component summary, mental health subscale, and vitality subscale) without differences in mood or cognitive function (30). Therefore, we suspect that subjects in the current study had slightly lower quality of life than matched euthyroid subjects. We performed a large number of correlations, although we accounted for this difference in our analyses, and it is possible that some of our minor findings were due to chance. Most of our subjects were women and were younger and slimmer than the U.S. population. Most of our subjects were Caucasian. Our subjects were heterogeneous in terms of thyroid diagnosis and length of L-T4 treatment. We limited our study to 6 months to optimize subject retention, recognizing that this is sufficient time to observe changes in our outcomes. Many of our subjects experienced variations in TSH levels at interim visits that necessitated L-T4 dose adjustments, which we accounted for in our analysis. One-third of our subjects did not achieve target TSH levels, particularly in the high-normal TSH group. To address this limitation, we conducted separate intention-to-treat and actual end-of-study analyses, as well as analyses using changes in TSH and thyroid hormones as continuous variables. These complementary analyses showed similar results, strengthening our conclusions. In addition, we note that regardless of the ultimate TSH attained, L-T4 doses were altered in each arm, consistent with the study design. Because patients often request changes in their L-T4 doses regardless of their TSH levels, an interpretation of our results based on L-T4 dose adjustments is a valuable perspective for clinical practice. We attempted to collect blood samples at a consistent time of day, but this was not always possible. In healthy and L-T4-treated subjects, TSH levels decrease slightly between 07:00 and 09:00 and then remain stable until the evening (41). Finally, we limited our cognitive testing to executive function and memory, although studies do not indicate major effects in other areas (1).
In summary, we found no relevant differences in health status, mood, memory, or executive functions in hypothyroid subjects when L-T4 doses were altered in a randomized, blinded fashion to achieve TSH levels in the low-normal, high-normal, or mildly elevated range. Given our limited sample size, additional studies would be helpful, particularly in targeted populations (e.g., symptomatic subjects, subjects with low fT3 levels, or subjects with genetic polymorphisms that affect thyroid hormone action). In the absence of definitive data, reasonable expectations should be discussed with treated hypothyroid patients who report symptoms in these areas and request higher L-T4 doses or alternative thyroid hormone preparations.
Supplementary Material
Acknowledgments
We thank the staff of the OHSU Clinical and Translational Research Center for excellent patient care and research support and the Biostatistics & Design Program for data analysis expertise.
Financial Support: This work was supported by National Institutes of Health Grants R01 DK075496 (to M.H.S.) and UL1 RR024120 (to OHSU).
Clinical Trial Information: ClinicalTrials.gov no. NCT00565864 (registered November 30, 2007).
Disclosure Summary: The authors have nothing to disclose.
Glossary
Abbreviations:
- ALS
Affective Lability Scale
- BMI
body mass index
- CV
coefficient of variation
- fT3
free triiodothyronine
- fT4
free thyroxine
- L-T4
levothyroxine
- OHSU
Oregon Health & Science University
- POMS
Profile of Mood States
- SF-36
36-Item Short Form Health Survey
- TSH
thyrotropin
- WAIS-R
Wechsler Adult Intelligence Scale–Revised
References
- 1. Samuels MH. Thyroid disease and cognition. Endocrinol Metab Clin North Am. 2014;43(2):529–543. [DOI] [PubMed] [Google Scholar]
- 2. Pollock MA, Sturrock A, Marshall K, Davidson KM, Kelly CJ, McMahon AD, McLaren EH. Thyroxine treatment in patients with symptoms of hypothyroidism but thyroid function tests within the reference range: randomised double blind placebo controlled crossover trial. BMJ. 2001;323(7318):891–895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Bono G, Fancellu R, Blandini F, Santoro G, Mauri M. Cognitive and affective status in mild hypothyroidism and interactions with L-thyroxine treatment. Acta Neurol Scand. 2004;110(1):59–66. [DOI] [PubMed] [Google Scholar]
- 4. Walsh JP, Ward LC, Burke V, Bhagat CI, Shiels L, Henley D, Gillett MJ, Gilbert R, Tanner M, Stuckey BG. Small changes in thyroxine dosage do not produce measurable changes in hypothyroid symptoms, well-being, or quality of life: results of a double-blind, randomized clinical trial. J Clin Endocrinol Metab. 2006;91(7):2624–2630. [DOI] [PubMed] [Google Scholar]
- 5. Jorde R, Waterloo K, Storhaug H, Nyrnes A, Sundsfjord J, Jenssen TG. Neuropsychological function and symptoms in subjects with subclinical hypothyroidism and the effect of thyroxine treatment. J Clin Endocrinol Metab. 2006;91(1):145–153. [DOI] [PubMed] [Google Scholar]
- 6. Zhu DF, Wang ZX, Zhang DR, Pan ZL, He S, Hu XP, Chen XC, Zhou JN. fMRI revealed neural substrate for reversible working memory dysfunction in subclinical hypothyroidism. Brain. 2006;129(Pt 11):2923–2930. [DOI] [PubMed] [Google Scholar]
- 7. Samuels MH, Schuff KG, Carlson NE, Carello P, Janowsky JS. Health status, mood, and cognition in experimentally induced subclinical hypothyroidism. J Clin Endocrinol Metab. 2007;92(7):2545–2551. [DOI] [PubMed] [Google Scholar]
- 8. Razvi S, Ingoe L, Keeka G, Oates C, McMillan C, Weaver JU. The beneficial effect of L-thyroxine on cardiovascular risk factors, endothelial function, and quality of life in subclinical hypothyroidism: randomized, crossover trial. J Clin Endocrinol Metab. 2007;92(5):1715–1723. [DOI] [PubMed] [Google Scholar]
- 9. Correia N, Mullally S, Cooke G, Tun TK, Phelan N, Feeney J, Fitzgibbon M, Boran G, O’Mara S, Gibney J. Evidence for a specific defect in hippocampal memory in overt and subclinical hypothyroidism. J Clin Endocrinol Metab. 2009;94(10):3789–3797. [DOI] [PubMed] [Google Scholar]
- 10. Parle J, Roberts L, Wilson S, Pattison H, Roalfe A, Haque MS, Heath C, Sheppard M, Franklyn J, Hobbs FD. A randomized controlled trial of the effect of thyroxine replacement on cognitive function in community-living elderly subjects with subclinical hypothyroidism: the Birmingham Elderly Thyroid study. J Clin Endocrinol Metab. 2010;95(8):3623–3632. [DOI] [PubMed] [Google Scholar]
- 11. Stott DJ, Rodondi N, Kearney PM, Ford I, Westendorp RGJ, Mooijaart SP, Sattar N, Aubert CE, Aujesky D, Bauer DC, Baumgartner C, Blum MR, Browne JP, Byrne S, Collet TH, Dekkers OM, den Elzen WPJ, Du Puy RS, Ellis G, Feller M, Floriani C, Hendry K, Hurley C, Jukema JW, Kean S, Kelly M, Krebs D, Langhorne P, McCarthy G, McCarthy V, McConnachie A, McDade M, Messow M, O’Flynn A, O’Riordan D, Poortvliet RKE, Quinn TJ, Russell A, Sinnott C, Smit JWA, Van Dorland HA, Walsh KA, Walsh EK, Watt T, Wilson R, Gussekloo J; TRUST Study Group . Thyroid hormone therapy for older adults with subclinical hypothyroidism. N Engl J Med. 2017;376(26):2534–2544. [DOI] [PubMed] [Google Scholar]
- 12. Spreen O, Strauss EA. General intellectual ability and assessment of premorbid intelligence. In: Spreen O, Strauss EA, eds. A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary. New York, NY: Oxford University Press; 1998:90–102. [Google Scholar]
- 13. Billewicz WZ, Chapman RS, Crooks J, Day ME, Gossage J, Wayne E, Young JA. Statistical methods applied to the diagnosis of hypothyroidism. Q J Med. 1969;38(150):255–266. [PubMed] [Google Scholar]
- 14. McMillan C, Bradley C, Razvi S, Weaver J. Evaluation of new measures of the impact of hypothyroidism on quality of life and symptoms: the ThyDQoL and ThySRQ. Value Health. 2008;11(2):285–294. [DOI] [PubMed] [Google Scholar]
- 15. Spreen O, Strauss EA. Adaptive behavior and personality. In: Spreen O, Strauss EA, eds. A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary. New York, NY: Oxford University Press; 1998:612–616. [Google Scholar]
- 16. Spreen O, Strauss EA. Adaptive behavior and personality. In: Spreen O, Strauss EA, eds. A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary. New York, NY: Oxford University Press; 1998:644–646. [Google Scholar]
- 17. Harvey PD, Greenberg BR, Serper MR. The affective lability scales: development, reliability, and validity. J Clin Psychol. 1989;45(5):786–793. [DOI] [PubMed] [Google Scholar]
- 18. Byrd DA, Touradji P, Tang MX, Manly JJ. Cancellation test performance in African American, Hispanic, and White elderly. J Int Neuropsychol Soc. 2004;10(3):401–411. [DOI] [PubMed] [Google Scholar]
- 19. Lezak MD, Howieson DB, Loring DW. Orientation and attention. In: Lezak MD, Howieson DB, Loring DW, eds. Neuropsychological Assessment. New York, NY: Oxford University Press; 1995:371–374. [Google Scholar]
- 20. Singh V, Khan A. Heterogeneity in choices on Iowa Gambling Task: preference for infrequent-high magnitude punishment. Mind Soc. 2009;8(1):43–57. [Google Scholar]
- 21. Lezak MD, Howieson DB, Loring DW. Orientation and attention. In: Lezak MD, Howieson DB, Loring DW, eds. Neuropsychological Assessment. New York, NY: Oxford University Press; 1996:363–364. [Google Scholar]
- 22. Spreen O, Strauss EA. Executive functions. In: Spreen O, Strauss EA, eds. A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary. New York, NY: Oxford University Press; 1998:208–212. [Google Scholar]
- 23. Lezak M, Howieson DB, Loring DW. Memory I: tests. In: Lezak M, Howieson DB, Loring DW, eds. Neuropsychological Assessment. New York, NY: Oxford University Press; 1995:444–450. [Google Scholar]
- 24. van Gorp WG, Altshuler L, Theberge DC, Mintz J. Declarative and procedural memory in bipolar disorder. Biol Psychiatry. 1999;46(4):525–531. [DOI] [PubMed] [Google Scholar]
- 25. Spreen O, Strauss EA. Motor tests. In: Spreen O, Strauss EA, eds. A Compendium of Neuropsychological Tests: Administration, Norms and Commentary. New York, NY: Oxford University Press; 1998:577–599. [Google Scholar]
- 26. Spencer CA, Hollowell JG, Kazarosyan M, Braverman LE. National Health and Nutrition Examination Survey III thyroid-stimulating hormone (TSH)–thyroperoxidase antibody relationships demonstrate that TSH upper reference limits may be skewed by occult thyroid dysfunction. J Clin Endocrinol Metab. 2007;92(11):4236–4240. [DOI] [PubMed] [Google Scholar]
- 27.R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing;. 2015. Available at: http://www.R-project.org/. Accessed 5 June 2017.
- 28. Cohen J. Statistical Power Analysis for the Behavioral Sciences. Mahwah, NJ: Lawrence Erlbaum Associates; 1988. [Google Scholar]
- 29. Samuels MH, Kolobova I, Smeraglio A, Niederhausen M, Janowsky JS, Schuff KG. Effects of thyroid function variations within the laboratory reference range on health status, mood, and cognition in levothyroxine-treated subjects. Thyroid. 2016;26(9):1173–1184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Samuels MH, Kolobova I, Smeraglio A, Peters D, Janowsky JS, Schuff KG. The effects of levothyroxine replacement or suppressive therapy on health status, mood, and cognition. J Clin Endocrinol Metab. 2014;99(3):843–851. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Roberts LM, Pattison H, Roalfe A, Franklyn J, Wilson S, Hobbs FD, Parle JV. Is subclinical thyroid dysfunction in the elderly associated with depression or cognitive dysfunction? Ann Intern Med. 2006;145(8):573–581. [DOI] [PubMed] [Google Scholar]
- 32. Wijsman LW, de Craen AJ, Trompet S, Gussekloo J, Stott DJ, Rodondi N, Welsh P, Jukema JW, Westendorp RG, Mooijaart SP. Subclinical thyroid dysfunction and cognitive decline in old age. PLoS One. 2013;8(3):e59199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Fjaellegaard K, Kvetny J, Allerup PN, Bech P, Ellervik C. Well-being and depression in individuals with subclinical hypothyroidism and thyroid autoimmunity: a general population study. Nord J Psychiatry. 2015;69(1):73–78. [DOI] [PubMed] [Google Scholar]
- 34. van de Ven AC, Netea-Maier RT, de Vegt F, Ross HA, Sweep FC, Kiemeney LA, Hermus AR, den Heijer M. Is there a relationship between fatigue perception and the serum levels of thyrotropin and free thyroxine in euthyroid subjects? Thyroid. 2012;22(12):1236–1243. [DOI] [PubMed] [Google Scholar]
- 35. Klaver EI, van Loon HC, Stienstra R, Links TP, Keers JC, Kema IP, Kobold AC, van der Klauw MM, Wolffenbuttel BH. Thyroid hormone status and health-related quality of life in the LifeLines Cohort Study. Thyroid. 2013;23(9):1066–1073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Engum A, Bjøro T, Mykletun A, Dahl AA. An association between depression, anxiety and thyroid function--a clinical fact or an artefact? Acta Psychiatr Scand. 2002;106(1):27–34. [DOI] [PubMed] [Google Scholar]
- 37. Wartofsky L, Dickey RA. The evidence for a narrower thyrotropin reference range is compelling. J Clin Endocrinol Metab. 2005;90(9):5483–5488. [DOI] [PubMed] [Google Scholar]
- 38. Winstanley CA, Clark L. Translational models of gambling-related decision-making. Curr Top Behav Neurosci. 2016;28:93–120. [DOI] [PubMed] [Google Scholar]
- 39. Panicker V, Evans J, Bjøro T, Asvold BO, Dayan CM, Bjerkeset O. A paradoxical difference in relationship between anxiety, depression and thyroid function in subjects on and not on T4: findings from the HUNT study. Clin Endocrinol (Oxf). 2009;71(4):574–580. [DOI] [PubMed] [Google Scholar]
- 40. Jonklaas J, Bianco AC, Bauer AJ, Burman KD, Cappola AR, Celi FS, Cooper DS, Kim BW, Peeters RP, Rosenthal MS, Sawka AM; American Thyroid Association Task Force on Thyroid Hormone Replacement . Guidelines for the treatment of hypothyroidism: prepared by the American Thyroid Association Task Force on Thyroid Hormone Replacement. Thyroid. 2014;24(12):1670–1751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Roelfsema F, Veldhuis JD. Thyrotropin secretion patterns in health and disease. Endocr Rev. 2013;34(5):619–657. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

