Skip to main content
PLOS One logoLink to PLOS One
. 2020 Apr 3;15(4):e0230852. doi: 10.1371/journal.pone.0230852

A two-step procedure to generate utilities for the Infant health-related Quality of life Instrument (IQI)

Paul F M Krabbe 1,*, Ruslan Jabrayilov 1, Patrick Detzel 2, Livia Dainelli 2, Karin M Vermeulen 1, Antoinette D I van Asselt 1
Editor: Jing Tian3
PMCID: PMC7122817  PMID: 32243445

Abstract

Background

Because of a lack of preference-based health-related quality of life (HRQoL) instruments suitable for infants aged 0–12 months, we previously developed the Infant QoL Instrument (IQI). The present study aimed to generate an algorithm to estimate utilities for the IQI.

Methods

Via an online survey, respondents from the general population and primary caregivers from China-Hong Kong, the UK, and the USA were presented 10 discrete choice scenarios based on the IQI classification system. An additional sample of respondents from the general population were also asked if they considered the examined health states to be worse than death. Coefficients for the IQI item levels were obtained with a conditional logit model based on the responses of the primary caregivers for IQI states only. These coefficients were then normalized using the rank-ordered logit model based on the responses from the general population who assessed “death” as a choice option. In this way, the values were rescaled from full health (1.0) to death (0.0), and consequently, they became suitable for the computation of quality-adjusted life years.

Results

The total sample consisted of 1409 members of the general population and 1229 primary caregivers. Results indicated that, out of the 7 IQI items (“sleeping,” “feeding,” “breathing,” “stooling/poo,” “mood,” “skin,” and “interaction”), “breathing” had the highest impact on the HRQoL of infants. Moreover, except for “stooling,” all item levels were statistically significant. The general population sample considered none of the health states as worse than death. The utility value for the worst health state was 0.015 (State 4444444).

Conclusions

The IQI is the first generic instrument to assess overall HRQoL in 0–1-year-old infants by providing values and utilities. Using discrete choice experiments, we demonstrated that it is possible to derive utilities of infant health states. The next step will be to collect IQI values in a clinical population of infants and to compare these values with those of other instruments.

Introduction

Regulatory authorities and governmental organizations generally require studies to evaluate the value of health interventions. Many of these bodies recommend using a summary measure of health outcome, such as quality-adjusted life years (QALYs), as the unit of health benefit in economic evaluations [1]. Central to the computation of QALYs is the “quality” component, which is mostly quantified in terms of concepts such as health-status or health-related quality of life (HRQoL) [2]. Such HRQoL measures suited for QALYs are expressed in a single metric for health states or conditions, anchored on a unidimensional scale. They are often classified as “generic” and “preference-based.” While “generic” means that such tools can be applied across a wide range of populations and interventions, therefore allowing comparisons among them, “preference-based” means that these measurement methods are used to arrive at values that place health states on a scale. These methods explicitly incorporate weights that reflect the importance attached to specific health items (also known as attributes, domains, dimensions, or indicators) [3,4,5]. Preference-based methods stemming from economics, such as the standard gamble and time trade-off, are constructed such that they directly produce values on a scale, where 0.0 is equal to death and 1.0 is full health, and these values can be applied in QALY computations, where they are called utilities. Other types of preference-based methods require extensions or additional exercises to normalize values because death does not appear on the scale.

To overcome the lack of generic preference-based HRQoL instruments suitable for infants aged 0–12 months, the Infant Quality of life Instrument (IQI) was developed recently [6]. For the selection of the relevant domains (items) for the IQI, a multi-step development process began by extracting candidate health concepts from relevant measures that were identified by searching the literature. Next, panels, with experts from Asia, Europe, New Zealand, and the United States of America (USA), and two surveys, with primary caregivers in New Zealand, Singapore, and the United Kingdom (UK), evaluated the relevance of the candidate health concepts, organized them into attributes based on their similarities, explored alternative attributes, and generated response scales. Additional interviews assessed the cross-cultural interpretability, parents’ understanding of health attributes, and the usability of the mobile application.

We also conducted a study in which the IQI was used in a preference-based method (discrete choice) to generate values [7]. However, in the present study, we went one step further to generate utilities that are applicable in the computation of QALYs. It is unclear which method is most appropriate to obtain values and utilities for adolescents or children. Several studies discourage the use of conventional economic valuation methods, such as the time trade-off, with this age group because caregivers (proxies) are apparently unwilling to trade off a child’s life years, leading to relatively high values for poor health states [8,9,10]. In addition, these methods are not only complex but are also associated with numerous theoretical violations, biases, and practical problems [11, 12]. Deriving utilities from the general public who represent the societal perspective, is the prevailing approach in economic evaluations because the general public, being taxpayers and potential users of the healthcare system, are considered the most reasonable assessors [13].

The present study aimed to explain how utilities for the IQI health states were generated with a novel two-step choice-based modeling procedure that helps locate the position of death on the IQI scale [14]. Specifically, this process involved the following two steps: 1) deriving values for a set of IQI health states from primary caregivers of infants based on a discrete choice modeling exercise, 2) normalizing the values obtained in Step 1 to an anchored 0.0–1.0 scale using utilities derived from responses collected from a general population sample.

Methods

Instrument

The IQI includes 7 health items (“sleeping,” “feeding,” “breathing,” “stooling/poo,” “mood,” “skin,” and “interaction”), each relevant at each time point up to 1 year of age [5]. Each item consists of 4 levels, most of which are ranked by severity. For instance, the levels for “sleeping” are 1: sleeps well, 2: slightly affected sleep, 3: moderately affected sleep, and 4: severely disturbed sleep. Only for the item “skin,” the levels phrased qualitatively rather than quantitatively. The IQI can be administered through a mobile application (www.chateau-sante.com/iqi); its usability was previously tested on primary caregivers [5] and further improved in light of their opinions (Fig 1). For each health item, primary caregivers can select the level that best applies to their infant. In this way they “construct” an IQI health state that forms an overall health description that is expressed in 7 digits, e.g., 3231421, which would equate “moderately affected sleep, slight feeding problems, moderate breathing problems, normal stool/poo, inconsolable crying, dry or red skin, highly playful/highly interactive”.

Fig 1. Infant Quality of life Instrument (IQI) health items and their levels (left: screenshot of the mobile application for the IQI).

Fig 1

Recruitment of respondents

Primary caregivers of infants and toddlers (0–3 years old) and people from the general population were recruited in China-Hong Kong, the UK, and the USA to conduct the main study. These countries were selected for practical reasons, since they are culturally different yet share one language, thus eliminating the need for translation at this phase, and enabling the analysis of possible cross-cultural differences in the results, and therefore improving generalizability. Clear instructions were given to all participants. While the instrument targets infants up to one year, in the survey we chose to include primary caregivers of 2- and 3-year-olds as well, to enable the recruitment of a larger sample. We assumed that the caregivers could recollect their experiences of the first year of their infant’s life quite easily. The general population sample included both parents (of children with varying ages) and respondents without children, to be as representative as possible. The latter were interviewed because they might think differently about the value of life in different health conditions.

In an additional study, a separate smaller sample was drawn in the USA from among members of the general population. This study was conducted to gain a better understanding of the severe IQI health states to explore and confirm our findings from the main study.

All respondents were contacted through a market research company (Survey Sampling International, SSI). Respondents who completed the entire survey received a small financial compensation from SSI. The rewards were defined by the company’s (SSI) internal agreements with the groups of respondents. The Medical Ethics Review Committee at the University Medical Center of Groningen issued a waiver for this study, indicating that the pertinent Dutch legislation (the Medical Research Involving Human Subjects Act) did not apply for this non-interventional study (METc2017.115).

Response task

Respondents in the main study were presented with 10 discrete choice scenarios in an online survey. They were requested to indicate which of the two hypothetical health states presented was better (Fig 2). The order of the items (e.g., sleeping, breathing, interaction) was randomized for every respondent. Prior to each paired comparison task, respondents were instructed about two assumptions- that the health states presented in the task would occur in the first year of life, and that it was uncertain as to what would happen after that year. In adult populations, the timespan of the comparison typically ends at death. As this does not seem appropriate for a younger population, we believe that by only describing the situation in the first year and not being explicit about what comes after, the focus of the comparison would be on the first year, as intended. After the paired comparison task, respondents were asked to indicate whether they considered any of the two health states as being worse than death.

Fig 2. Screenshot of the discrete choice task and the complementary “better or worse than death” task.

Fig 2

Discrete choice design

With the IQI classification system, a total of 47 (16,384; 7 items with 4 levels) health state classifications are possible. Consequently, 134,209,536 ([[16,384 * 16,384] - 16,384]/2) unique IQI health state pairs can be devised. To select a smaller number of pairs from this large pool, several criteria were used. The first was based on the fact that comparisons containing a dominant health state, i.e., a state with all items at a better level than the comparator state (e.g., 222222 vs. 3333333), are not ideal for selection, as they do not add information. Therefore, pairs with one health state dominating the other were excluded from the task. The second criterion was the selection of pairs with a certain overlap, deemed to facilitate comprehension. Specifically, we included pairs that varied on 4 items and overlapped on 3 (see Fig 2: only 4 of the items vary between Infant A and B). In the 4 items that varied, 2 represented better‐off item levels in Alternative A than Alternative B, and 2 represented worse‐off item levels in Alternative A than in Alternative B. The third criterion was that, in at least half of the tasks, the maximum difference in item levels between the health states was set to 1. For example, Level 2 could be compared to Level 1 and 3, but not to Level 4. The rationale for restricting the range was to avoid comparisons of health states that were very different from each other. However, the remaining set of tasks would allow larger differences. The above criteria of our study design were all programmed in MATLAB [15].

A discrete choice design generally entails selecting items from the full set of all possible health states. Accordingly, the worst health states at the top and bottom of the scale are often absent. The included health states are often “mild” and they are thus likely to be considered better than “death.” Therefore, to gain insight into the worst health states as well as their relationship to “death,” we conducted an additional discrete choice study. In this study, half of the participants were asked to compare two IQI health states composed of the worst item levels (i.e., severe problems) except for one item with Level 3 (S1 Fig). In total, 7 × 6 = 42/2 = 21 such health pairs are possible. Then, the other half were asked to compare these worst health states (6 items with Level 4 and 1 item with Level 3) to the health state “death” (S2 Fig). In total, 7 such comparisons are possible.

Analyses

Coefficients for the IQI item levels were obtained with a conditional logit model (Stata, clogit) based on the responses of the primary caregivers (the same analysis was also conducted for the general population but was not a part of our strategy to arrive at utilities). Initially, the first level (i.e., no problems) of each health item was taken as the reference category. The coefficients for the remaining 3 levels were then estimated using 21 dummy variables (7 × 3). After preliminary analyses to assess “interaction,” the reference category was changed to the second level, because the first level in this health item did not represent the best health condition; i.e., the second level had the highest coefficient. This implies that the first level “highly playful/highly interactive” would result in a lower score in the valuation than would the second level “playful/interactive.”

The value of a health state j for individual i is denoted by Vij. It is assumed that Vij is a linear combination of the levels on the health items plus an error term εij for the individual. The model specification is

Vij=j=1nβxij+εij (1)

where βs represents a vector of 21 regression coefficients and xij a vector of 21 binary dummy explanatory variables (xδλ), where λ indicates the levels of each of the 7 items (δ = 1, 2, …, 7) for a health state [16]. For example, x42 represents the second level (slight problems) of the fourth item (“breathing”). All computations and the visualization of the results were carried out using Stata 15.0 [17], R programming language [18], and SigmaPlot 14.0 [19].

Next, the estimated coefficients from the primary caregivers were normalized. By normalizing we mean that the values are transformed or rescaled to produce a common utility scale (0–1). For this purpose, data obtained from the “death” task (Fig 2) performed by the general population sample (the same analysis was also conducted on the data collected from the primary caregivers but it was not a part of our strategy to arrive at utilities) were analyzed by the rank-ordered logit model (Stata, rologit) [20].

Vij=j=1nβxij+βD+εij

In addition to the 21 dummy variables, one for “death” (D) was used to rescale the values from 0.0 (death) to 1.0 (full health). By dividing the remaining coefficients by the estimated death coefficient, all health states were rescaled from 0.0 to 1.0. The utility of a health state is calculated as 1 minus the sum of coefficients for the corresponding levels on the health items.

Results

Respondents

In total 2638 respondents were recruited from China-Hong Kong (n = 818), the UK (n = 920) and the USA (n = 890). The sample consisted of members of the general population (n = 1409) and primary caregivers of infants (n = 1229). The average age of the respondents was 37 (median: 35) years, with 73% of the sample comprising females (Table 1). For the subsample of primary caregivers of infants, the average age was lower (33 years) and the proportion of females (93%) was substantially higher, which was expected given that a primary caregiver would typically be a young mother. For the additional study, a total of 1027 respondents (N = 523 for Task 1, N = 504 for Task 2) were recruited from among the members of the general population in the USA; 49% of the respondents were females and the mean age was 32 (median: 33) years.

Table 1. Demographics of the various study samples.

Demographic characteristics General population (N = 1409) Primary caregivers (N = 1229) Total sample (N = 2638) Additional sample (n = 1027)
Country
    China-Hong Kong 421 (30%) 407 (33%) 828 (31%) -
    UK 516 (37%) 404 (33%) 920 (35%) -
    USA 472 (33%) 418 (34%) 890 (34%) 1027 (100%)
Gender
    Male 632 (45%) 81 (7%) 713 (27%) 523 (51%)
    Female 777 (55%) 1148 (93%) 1925 (73%) 504 (49%)
Age (years)
    Mean 41 33 37 32
    Median 39 33 35 33

States worse than death

Among the health states in the main study, the one mentioned most frequently as being worse than death was 4241241 (2.2%) for the general population and 4244231 (2.4%) for the primary caregivers. Among the 7 worst health states that were included in the additional study for the general population sample only, the percentage of respondents indicating a health state as being worse than death ranged between 20% and 25%.

Coefficients IQI items

The coefficients for the levels of the 7 IQI items based on judgments made by the primary caregivers showed that “breathing” had the highest impact on the HRQoL of infants (Table 2). For the sample of primary caregivers, all levels of the 7 IQI items proved to be statistically significant (coefficient > 0.0), except “stooling” Level 3 (moderate stool/poo problems). Coefficients were negative for most of the levels of these items and followed a logical order (i.e., slight problems < moderate problems < severe problems). Negative coefficients implied that a particular level was worse than the baseline, which in our study was the first level of each health item, except for the interaction item. Moreover, the less preferable an item was considered, the higher its coefficient was, in the negative direction. For 4 items, i.e., “stooling,” “mood,” “skin,” and “interaction,” the order of the coefficients was not strictly monotonously decreasing. For example, the coefficient (not statistically significant) for the third level (moderate problems) had a positive coefficient for “stooling” in the overall sample, indicating that it was more preferable than the baseline level (no problems) and also than the second level (slight problems). For a more detailed discussion about the coefficients, see our earlier publication [6]. The coefficients obtained from the primary caregivers and the general population based on the choice experiment without “death” were rather comparable (Table 2, S3 Table). Slightly different results were obtained in the more complex choice experiment, which included the “death” options. Specifically, 4 non-statistically significant coefficients were observed for the general population sample and 5 were observed for the caregivers. The coefficients within the item “interaction” were almost equal for the general population sample, as well as the “death” coefficient (Table 2, S1 Table).

Table 2. Parameter estimates for the levels of the 7 IQI health items separately for the primary caregivers, the general population, and the normalized primary caregivers’ parameters.

Primary caregivers (DC) General population (DC + Death) Primary caregivers normalized scale (utilities)
Coefficient SE Significance Coefficient SE Significance Coefficient
Sleeping (2) -.246 .046 .000 -.192 .036 .000 -.056
Sleeping (3) -.403 .052 .000 -.226 .036 .000 -.092
Sleeping (4) -.774 .052 .000 -.599 .036 .000 -.176
Feeding (2) -.158 .046 .001 -.191 .035 .000 -.036
Feeding (3) -.162 .050 .001 -.143 .035 .000 -.037
Feeding (4) -.683 .054 .000 -.381 .035 .000 -.155
Breathing (2) -.395 .049 .000 -.121 .036 .001 -.090
Breathing (3) -.585 .052 .000 -.204 .036 .000 -.133
Breathing (4) -1.047 .055 .000 -.664 .036 .000 -.238
Stooling (2) -.100 .045 .003 .037 .033 .272* -.023
Stooling (3) -.039 .052 .455* .178 .035 .000 -.009
Stooling (4) -.268 .063 .000 -.058 .038 .130* -.061
Mood (2) -.509 .047 .000 -.303 .034 .000 -.116
Mood (3) -.380 .049 .000 -.219 .035 .000 -.086
Mood (4) -.613 .058 .000 -.300 .038 .000 -.139
Skin (2) -.166 .045 .000 -.026 .034 .459* -.038
Skin (3) -.120 .048 .014 -.026 .034 .447* -.027
Skin (4) -.416 .056 .000 -.195 .035 .000 -.095
Interaction (1) -.170 .047 .000 -.081 .035 .020 -.039
Interaction (3) -.360 .049 .000 -.073 .036 .043 -.082
Interaction (4) -.531 .052 .000 -.078 .037 .035 -.121
Death N. A. N. A. N. A. -2.307 .075 .000 N. A.

* Coefficients not significantly different than the baseline category (Level 1)

IQI = Infant Quality of life Instrument

Some differences were observed between the different countries. In China, for example, sleeping was considered the most important item, whereas it was less important in the UK and USA. Feeding was considered more important in the USA, while breathing was considered less important in China. The coefficients for “death” were -1.769, -2.827, and -2.474, respectively, for China, the UK, and the USA. For more details see [7, S2 Table].

Utilities for IQI health states

According to the general population sample, no health state had a value below 0.0 (i.e., none was considered worse than death), as indicated by the normalized health state values (utilities). Values of the primary caregivers were normalized by the IQI utilities derived from the general population. The worst health state among the primary caregivers was 4444444, with a value of -4.332. Among the general population, the utility for that state was 0.015. The distance between 0.015 and zero (death) was used to normalize the caregivers’ values into utilities (S3 Fig). The final coefficients that were applied to compute utilities for IQI health states are presented in the last column of Table 2. Utilities for all possible IQI health states (N = 16,384) were calculated by subtracting from 1.0, the coefficients corresponding to the levels for the 7 IQI items (Fig 3). In addition, the distribution of utilities for all IQI states were calculated (Fig 4). The utility for the worst IQI state (4444444) was 0.015, that for State 3333333 it was 0.534, and that for State 2222222 it was 0.641. The best IQI state (utility score 1.000) was 1111112, since Level 2 was the reference level for the interaction domain, and therefore, 1111112 was considered to represent perfect health. State 1111111 had a utility score of 0.961 (a complete list of utilities is available on request).

Fig 3. Estimated health-state values obtained for the primary caregivers with discrete choice tasks (pairs of IQI states) and for the general population with a ranking task (two IQI states + death).

Fig 3

Fig 4. Estimated health-state utilities obtained for the primary caregivers (discrete choice task) normalized for the location of death (based on values derived from task with death by the general population).

Fig 4

The same analysis for the three countries separately showed that the utilities for IQI state 4444444 were -0.121, 0.114, and -0.022, respectively, for China, the UK, and the USA. Distributions for the three countries showed that the distribution of utilities of the IQI health states for China is different in comparison with that in the UK and USA (S4 Fig). In the Chinese sample, health states were considered more worse and a small proportion of states were considered worse than death. Normalized coefficients for each country have been presented in S3 Table.

Discussion

In this study, we explained a novel two-step procedure to generate utilities for the IQI, relying on a sample comprising both primary caregivers and members of the general population. Values were derived for IQI health states based on responses from primary caregivers (e.g., proxies). Results indicated that out of the 7 IQI items, “breathing” had the highest impact on the HRQoL of infants. Moreover, except for “stooling,” all item levels were statistically significant. Subsequently, these values of the caregivers were normalized into utilities by using information on the location of death on the scale, derived from a general population sample. Findings revealed that none of the health states contained in the IQI was worse than death.

According to the current convention in QALY computation, the lower anchor on the scale should be 0.0, defined as a state equivalent to death [21,22,23]. However, a critical problem that is associated with current economic valuation methods is the lack of a reliable method to determine the position of “death” on the HRQoL scale. The use of the concept of “death” in health measurement methods is a controversial issue [24]; some find it astonishing that health economists make death a central element of their valuation methods [25]. In the present study, the lowest utility for the worst IQI health state (4444444) was only just better than death, namely 0.015. A widely applied generic preference-based health status instrument like the EQ-5D has presented, in its original 3-level version (EQ-5D-3L), a negative value of -0.59 for its worst health state [26]. However, in the new 5-level version (EQ-5D-5L) the value for its worst state is less negative (-0.29), as observed in many other preference-based competitor instruments [27]. The Health Utility Index Mark III, for example, shows a value of -0.36 for its worst state, but other widely used instruments, such as the Quality of Wellbeing index, 15D, SF-6D, and AQol-8D, show positive values of 0.32, 0.11, 0.20, and >0.20, respectively [28,29]. Additionally, in practice, since reimbursement decisions will be made based on a broader range of criteria than just cost-effectiveness, we do not expect that having the lowest IQI value higher than 0 would be a disadvantage for the infant population.

In this study, we used conventional discrete choice models to derive utilities for the IQI. Recently, we introduced a novel and straightforward measurement system to derive weights (coefficients) to estimate values and utilities. This measurement system is based on the item response theory and discrete choice methods [30]. This model continuously collects responses from patients or proxies and integrates findings from the health classification and preference tasks [31,32]. With this new method, the health-outcome measures collected are not only preference-based but also patient-centered [33,34,35]. Once tested and validated, we intend to apply this measurement model in future clinical IQI studies to estimate utilities.

The country-specific results for the primary caregivers revealed a number of differences. Overall, the UK and USA appeared more alike, while China was slightly different. This could have its source in the different cultures of the countries in the sample, but given that sample sizes of around 400 were used for each country, a part of this finding may also be an artefact. In China, sleep appeared to be more important, whereas in the UK and USA, higher coefficients were observed for most other attributes, and more value was attached to breathing, mood, and interaction. Eventually, a larger sample should be used to determine final value sets based on country-specific preferences.

A limitation associated with this study is that no obvious state worse than death was found. From a statistical perspective, values are likely to be biased when a significant proportion of the sample takes the normative position that all life is worth living [36]. However, the additional analysis in which 21 pairs of severe health states were utilized demonstrated ample effect on the estimation of the worse health states. The same holds for the analysis that included death. Therefore, the utility (0.015) of the worst IQI health state (4444444) seems credible as it was confirmed by the statistical models and by the simple preference task used in the additional analysis. Another limitation was that detailed characteristics of the respondents were not available, apart from country, age, and sex. Therefore, we were not able to, for instance, say which part of the general population sample comprised parents, nor could we perform stratified analyses based on socioeconomic status. It should be kept in mind that the present study was largely intended as a proof of principle to demonstrate how the process of generating normalized values (utilities) can take place and to provide a first value set for the IQI. From that perspective, the generalizability of the results is of less importance at this stage.

The IQI is the first generic preference-based instrument to assess overall HRQoL in 0–1-year-old infants. We demonstrated that it is possible to generate an algorithm to derive utilities from the infant health states using discrete choice experiments. The next step will be to use the IQI in a clinical population of infants to see how it performs as compared to other instruments, and to collect responses to refine the value set over time.

Supporting information

S1 Fig. Additional study (Part 1) in which the 7 worst IQI health states (apart from 4444444) were assessed against each other.

(EPS)

S2 Fig. Additional study (Part 2) in which the 7 worst IQI health states (apart from 4444444) were assessed against death.

(EPS)

S3 Fig. Relationship between the values for the primary caregivers (bottom) and the normalized values for the general population (top).

(EPS)

S4 Fig. Normalized values (utilities) per country (same X- and Y-axis).

(EPS)

S1 Table. Parameter estimates for the levels of the 7 IQI health items separately for the general population and primary caregivers.

(DOCX)

S2 Table. Parameter estimates (values) for the levels of the 7 IQI health items for the primary caregivers, per country.

(DOCX)

S3 Table. Parameter estimates (normalized scale = utilities) for the levels of the 7 IQI health items for the primary caregivers, per country.

(DOCX)

Data Availability

All relevant data are within the paper and its Supporting Information files.

Funding Statement

This study has been funded by Nestec Ltd to finance the research activities of RJ, KV, AvA and PK. PD and LD are employed at Nestlé Research Center. The funder provided support in the form of salaries for authors [RJ, KV, PD, LD, AvA, PK], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

References

  • 1.Drummond MF, Sculpher MJ, Claxton K, Stoddart GL, Torrance GW. Methods for the economic evaluation of health care programmes. Oxford: Oxford University Press; 2015. [Google Scholar]
  • 2.Krabbe PFM. The measurement of health and health status: Concepts, methods and applications from a multidisciplinary perspective. San Diego: Elsevier/Academic Press; 2016. [Google Scholar]
  • 3.Feinstein A. An additional basic science for clinical medicine: IV. The development of clinimetrics. Ann Intern Med. 1983;99: 834–848. 10.7326/0003-4819-99-6-834 [DOI] [PubMed] [Google Scholar]
  • 4.Boyle M, Torrance W. Developing multiattribute health indexes. Med Care 1984;22(11): 1045–1057. 10.1097/00005650-198411000-00007 [DOI] [PubMed] [Google Scholar]
  • 5.Neumann PJ, Sanders GD, Russell LB, Siegel JE,‎ Ganiats TG. Cost-effectiveness in health and medicine. Oxford: University Press; 2016. [Google Scholar]
  • 6.Jabrayilov R, van Asselt ADI, Vermeulen KM, Volger S, Detzel P, Dainelli L, et al. A descriptive system for the Infant health-related Quality of life Instrument (IQI): Measuring health with a mobile app. PLoS One. 2018; 13(8): e0203276 10.1371/journal.pone.0203276 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Jabrayilov R, Vermeulen KM, Detzel P, Dainelli L, van Asselt ADI, Krabbe PFM. The Infant health-related Quality of life Instrument (IQI): Valuing health status in the first year of life. Value Health 2019;22(6): 721–727. 10.1016/j.jval.2018.12.009 [DOI] [PubMed] [Google Scholar]
  • 8.Kreimeier S, Cole A, Devlin N, Herdman M, Mulhern B, Oppe M, et al. Comparing valuation of the EQ-5D-Y and the EQ-5D-3L: The impact of wording and perspective. Proceedings 32nd EuroQol Plenary Meeting 2015, Krakow; 2015.
  • 9.Ratcliffe J, Chen G, Stevens K, Bradley S, Couzner L, Brazier J, et al. Valuing Child Health Utility 9D health states with young adults: Insights from a time trade off Study. Appl Health Econ Health Policy. 2015;13(5): 485–492. 10.1007/s40258-015-0184-3 [DOI] [PubMed] [Google Scholar]
  • 10.Ratcliffe J, Couzner L, Flynn T, Sawyer M, Stevens K, Brazier J, et al. Valuing Child Health Utility 9D health states with a young adolescent sample: a feasibility study to compare best-worst scaling discrete-choice experiment, standard gamble and time trade-off methods. Appl Health Econ Health Policy. 2011;9: 15–27. 10.2165/11536960-000000000-00000 [DOI] [PubMed] [Google Scholar]
  • 11.Attema AE, Edelaar-Peeters Y, Versteegh MM, Stolk E. Time trade-off: One methodology, different methods. Eur J Health Econ. 2013: S53–64. 10.1007/s10198-013-0508-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Salomon J. Techniques for valuing health states. In. Culyer A.J. (Ed.), Encyclopedia of health economics (Vol. 3, pp. 454–8). San Diego: Elsevier; 2014. [Google Scholar]
  • 13.Gandjour A. Theoretical foundation of patient v. population preferences in calculating QALYs. Med Decis Making. 2010;30: E57–63. 10.1177/0272989X10370488 [DOI] [PubMed] [Google Scholar]
  • 14.Arons AMM, Krabbe PFM. Probabilistic choice models in health-state valuation research: background, theory, assumptions and relationships. Expert Rev of Pharmacoecon Outcomes Res. 2013;13: 93–108. [DOI] [PubMed] [Google Scholar]
  • 15.The MathWorks. MATLAB User’s Guide. The MathWorks, Inc., Natick, MA: 1993. [Google Scholar]
  • 16.McCabe C, Brazier J, Gilks P, Tsuchiya A, Roberts J, O’Hagan A, et al. Using rank data to estimate health state utility models. J Health Econ 2006;25: 418–431. 10.1016/j.jhealeco.2005.07.008 [DOI] [PubMed] [Google Scholar]
  • 17.StataCorp LLC, College Station, Texas, USA.
  • 18.R Core Team, (2013). R: A language and environment for statistical computing.
  • 19.Systat Software, Inc., San Jose, USA.
  • 20.Salomon JA. Reconsidering the use of rankings in the valuation of health states: a model for estimating cardinal values from ordinal data. Popul Health Metr. 2003;12: 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Weinstein MC, Fineberg HV. Clinical decision analysis. Philadelphia: WB Saunders; 1980. [Google Scholar]
  • 22.Torrance GW, Thomas WH, Sackett DL. A utility maximization model for evaluation of health care programs. Health Serv Res 1972;7: 118–133. [PMC free article] [PubMed] [Google Scholar]
  • 23.Weinstein MC, Torrance G, McGuire A. QALYs: The basics. Value Health 2009;12; S5–9. 10.1111/j.1524-4733.2009.00515.x [DOI] [PubMed] [Google Scholar]
  • 24.Norman R, Mulhern B., Viney R. The impact of different DCE-based approaches when anchoring utility scores. Pharmacoeconomics 2016;34(8): 805–814. 10.1007/s40273-016-0399-7 [DOI] [PubMed] [Google Scholar]
  • 25.Kamm FM. Morality, mortality. Volume I: Death and whom to save from It. New York: Oxford University Press; 1993. [Google Scholar]
  • 26.Dolan P. Modeling valuation for EuroQol health states. Medical Care. 1997;35: 1095–1108. 10.1097/00005650-199711000-00002 [DOI] [PubMed] [Google Scholar]
  • 27.Devlin N, Brazier J, Pickard A S, Stolk E. 3L, 5L, What the L? A NICE Conundrum. Pharmacoeconomics. 2018;36: 3–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Richardson J, Mckie J, Bariola E. Multiattribute utility instruments and their use. In Culyer A. J.(Ed.), Encyclopedia of health economics (Vol. 2, pp. 341–357). San Diego: Elsevier; 2014. [Google Scholar]
  • 29.Richardson J, Sinha K, Iezzi A, Khan MA. Modelling utility weights for the Assessment of Quality of Life (AQoL)-8D. Qual Life Res. 2014;23: 2395–2404. 10.1007/s11136-014-0686-8 [DOI] [PubMed] [Google Scholar]
  • 30.Krabbe PFM. A generalized measurement model to quantify health: the multi-attribute preference response model. PLoS One 2013;8: e79494 10.1371/journal.pone.0079494 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Groothuis-Oudshoorn CGM, van der Heuvel E, Krabbe PFM. A preference-based item response theory model to measure health: concept and mathematics of the multi-attribute preference response model. BMC Med Res Methodol 2018;18: 62 10.1186/s12874-018-0516-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Rasch G. Probabilistic models for some intelligence and attainment tests (Expanded edition with foreword and afterword by B.D. Wright). Chicago: University of Chicago Press; 1980. [Google Scholar]
  • 33.Sullivan M. The new subjective medicine: taking the patient’s point of view on health care and health. Soc Sci Med. 2003;56: 1595–1604. 10.1016/s0277-9536(02)00159-4 [DOI] [PubMed] [Google Scholar]
  • 34.Gabriel SE, Normand S-LT. Getting the methods right—the foundation of patient-centered outcomes research. N Engl J Med. 2012;367: 787–790. 10.1056/NEJMp1207437 [DOI] [PubMed] [Google Scholar]
  • 35.Reneman MF, Brandsema KPD, Schrier E, Dijkstra PU, Krabbe PFM. Patients first: towards a patient-centered, instrument to measure impact of chronic pain. Phys Ther 2018;98: 616–625. 10.1093/ptj/pzy040 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Flynn TN, Louviere JJ, Marley AA, Coast J, Peters TJ. Rescaling quality of life values from discrete choice experiments for use as QALYs: a cautionary tale. Popul Health Metr. 2008;6: 6 10.1186/1478-7954-6-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Jing Tian

30 Oct 2019

PONE-D-19-19703

A two-step procedure to generate utilities for the Infant health-related Quality of life Instrument (IQI)

PLOS ONE

Dear Mr. Krabbe,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

We would appreciate receiving your revised manuscript by Dec 13 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Jing Tian

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following in the Financial Disclosure section:

'This study has been funded by Nestec Ltd to finance the research activities of RJ, KV, AvA and PK. PD and LD are employed at Nestlé Research Center.

The funder provided support in the form of salaries for authors [RJ, KV, PD, LD, AvA, PK], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.'

We note that one or more of the authors are employed by a commercial company: Nestle

1. Please provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc. 

Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to  PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and  there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include an updated Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

3. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

Reviewer #3: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The purpose of this study is to generate an algorithm to estimate utilities for the infant quality of life instrument (IQI) using a two-step choice-based modelling approach. The approach and methods used for the study seem appropriate and I have just a few comments.

This study uses a population from the UK, USA and China-Hongkong and my question relates to why this population was used. Was it based on availability or was it to ensure generalisability? In addition, I am not entirely sure why the second study was limited to USA. Perhaps the authors need to comment on the generalisability of the results.

It would help if a table with the utility values for the various health states is presented in the appendix. Linked to this point, the lowest/worst health state is represented by 0.015 and as a result, there is a gap between this value and zero which is usually considered the anchor point for utility scales. As a result of this, the use of the IQI would imply that there could potentially be an over-estimation of utility for the severest health states which could potentially lead to a situation where the paediatric population would be disadvantaged compared to e.g. an adult population. Perhaps, the authors should also comment on the implications of using this instrument to assess the cost-effectiveness of interventions.

Reviewer #2: General comments to the authors

Thank you to the authors for providing me with the opportunity to review this novel and interesting quality of life research paper. The authors identified a substantial gap in the literature in their previous published work and are now developing the algorithm for the previously established descriptive system of the IQI to capture and assess proxy-reported outcomes for the derivation of health state utility valuations for infants aged 0-1 years.

I note that the underpinning theory regarding the development of the IQI’s descriptive/classification system by Krabbe et al’s authorship team in 2018 (https://doi.org/10.1371/journal.pone.0203276) where the views of the primary caregivers serve as the proxy and provide the response. This paper is the logical next step to that published work.

Overall, I consider that it would be useful to provide some additional background in the Introduction section of the manuscript regarding the development of the IQI’s descriptive/classification system, including the literature search and use of expert panels in this development. This would contextualise the reasons for this paper to the broader readership.

I also suggest that the Discussion section could be more circumspect about the development of this initial (albeit important) value set. Please include caveats regarding future confirmatory and clinical studies and perhaps refinement of the value set over time.

Finally, I also suggest that the manuscript should be thoroughly checked for grammatical errors.

I recommend the paper for publication if my key concerns are appropriately addressed as outlined below.

Specific comments to the authors:

Introduction section:

Line 69: The introductory paragraph could perhaps mention full health economic evaluation and cost-utility analyses to contextualise the generation of utilities as a health economic input metric for cost-utility analyses (refer Drummond et al 2015).

Line 86: As noted in the general comments, please include some additional background regarding the development of the IQI’s descriptive system.

Line 93 Aims: The ‘aims’ paragraph could be improved by describing how you planned to achieve your key objective with some further detail and perhaps numbered sequentially.

Methods section:

Line 108: please provide further explanation regarding the ‘skin’ dimension being classified ‘qualitatively’. Is the ‘mood’ dimension perhaps phrased qualitatively too? Do you mean that there needs to be a more subjective assessment of the response?

Line 110: is the term ‘parents’ = primary carers? Are there ‘primary carers’ of the infants who are not classified as parents? Please clarify.

Line 219: Could you please relabel this “Recruitment of Participants (or Respondents)”. Please expand on this fundamental section: the recruitment strategy requires additional explanation . I note that a fulsome explanation was provided of the discrete choice design and the subsequent analyses. It is insufficient to say that participants were reached through a market research company. I need to understand the sampling and any bias. I also need to understand why participants from the USA were recruited only for the second study. Please also clarify in this section the meaning of the ‘main study’ and the ‘additional study’.

Results section:

I note that the line numbers on the PDF files were not reproduced from the Results section, therefore the below comments are outlined section by section.

Given the international nature of the cohort, stratified results should be reported beyond section 3.1 to highlight any cultural/country differences.

Section 3.4: Please provide further contextualisation regarding the utility derived for ‘1111112’ and ‘1111111’. The reference point and the derivation of a utility score of 1.0 for 1111112 and 0.96 for 1111111 may require further clarification for the broader readership.

Discussion section:

The Discussion is perhaps the weakest section of the paper and should be expanded to showcase the key and secondary findings of the paper, thoroughly outline all of the limitations of the study, and provide a robust conclusion.

The first paragraph should provide an overall summary of the findings. One sentence is inadequate and this sentence describes what was done rather than a succinct summary of the key findings of the paper.

The Discussion also does not outline the international nature of the cohort, nor issues surrounding cultural differences for the proxy-respondents as primary carers of infants and therefore the generaton of potentially different value sets?

In the second paragraph of the Discussion, perhaps remove the word ‘first’ – the authors do not then go to ‘second’, ‘third’ etc to expand on the key findings. The second paragraph could also reference the AQoL-8D multi-attribute utility instrument’s algorithmic range as an example of an instrument that does not record a utility value that is less than zero.

In the third paragraph, I would remove the statement that “It is likely that states worse than death are less self-evident than generally thought and that the lowest utility for the EQ-5D-3L may have been an accidental finding”. Perhaps the suggestion could be underpinned by statements about the instruments only 243 health states and that the instrument is relatively insensitive to complex and chronic disease states. Please use appropriate referencing for this statement.

The fourth paragraph could be tightened to provide some additional explanation and contextualisation regarding the underpinning model and the advantages of the model. This paragraph in its current form is somewhat jumbled.

Limitations section. Please be more circumspect in this section – one limitation only is outlined. The recruitment/sampling strategy is not properly described therefore I can not comment on limitations regarding the sampling strategy. Similarly, stratified results are not presented I would expect that cultural differences would be evident in stratified results. Stratified results could also be the subject of further discussion.

The conclusion should provide additional explanation regarding future research and confirmatory studies.

Reviewer #3: The sample consisted of members of the general population (n=1409) and infant caregivers (n=1229). Were the members of the general population parents? If not, is there reason to believe that the sample of the general population would answer the questions significantly differently than the infant caregivers? I’m assuming from some of the information that follows in the manuscript, that yes, the subsamples answer some questions differently, but

In section 3.2 “states worse than death”, the authors write, “Among the health states in the main study, the one most frequently mentioned as being worse than death was 4241241 (2.2%) for the general population and 4244231 (2.4%) for the primary caregivers.” The digits are poorly explained in section 2.1 instrument. A slightly more detailed explanation would be useful for the reader earlier in the paper (perhaps expand the example given in line 113, “e.g., 3231421 equates to moderately affected sleep, slight feeding problems, moderate breathing problems, sleeps well, inconsolable crying, dry or red skin, highly playful/highly interactive.” Because as it is, I’m not even certain that I have interpreted this correctly. Is this correct?

Figure 2 is confusing because at the top it says, “Suppose that an infant’s first year of life is spent mainly in either State A or State B and that its health is uncertain afterwards.” At the bottom of the figure, “Please indicate if you would consider health state A and B as better or worse than death.” The use of the word “and” at the bottom leads me to believe that the infant has all the conditions in both A and B. Is this because there are 2 options to answer at the bottom?

“The main limitation associated with this study is that no obvious state worse than death was found.” It seems that the descriptions “Severely disturbed sleep, severe feeding problems, severe breathing problems, severe stool problems, inconsolable crying, bleeding or cracked skin, low-energy/inactive/dull” do not describe the pain of the infant and so do not elicit a response from an adult that there is no obvious state worse than death. Is it possible that the adults who took this survey view the 4’s as problems that occur occasionally and do not see these issues as worse than death? Adults can forget that infants are altricial, making the 4’s much more severe than they are for adults. How do you expect the results would vary if you included an outcome statement after each option (e.g., “severe stool/poo problems, resulting in hospitalization”?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Julie Campbell

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Apr 3;15(4):e0230852. doi: 10.1371/journal.pone.0230852.r002

Author response to Decision Letter 0


3 Jan 2020

Reviewer #1

The purpose of this study is to generate an algorithm to estimate utilities for the infant quality of life instrument (IQI) using a two-step choice-based modelling approach. The approach and methods used for the study seem appropriate and I have just a few comments.

This study uses a population from the UK, USA and China-Hongkong and my question relates to why this population was used. Was it based on availability or was it to ensure generalisability? In addition, I am not entirely sure why the second study was limited to USA. Perhaps the authors need to comment on the generalisability of the results.

Thank you for pointing this out. We agree and clarify the choice for the population in the methods section now.

“These countries were selected for practical reasons, since they are culturally different yet share one language, thus eliminating the need for translation at this phase, and enabling the analysis of possible cross-cultural differences in the results, and therefore improving generalizability.”

It would help if a table with the utility values for the various health states is presented in the appendix. Linked to this point, the lowest/worst health state is represented by 0.015 and as a result, there is a gap between this value and zero which is usually considered the anchor point for utility scales. As a result of this, the use of the IQI would imply that there could potentially be an over-estimation of utility for the severest health states which could potentially lead to a situation where the paediatric population would be disadvantaged compared to e.g. an adult population. Perhaps, the authors should also comment on the implications of using this instrument to assess the cost-effectiveness of interventions.

Although we do agree that a complete table would be informative, we feel that presenting a complete list of all the estimated utilities for all IQI health states is not feasible, because we have in total 16,384 (47: 7 items with 4 levels) unique health states. Therefore, we have listed a few health states with their associated estimated utility values as an example (section 3.4). However, the complete list is available on request.

Indeed, the lowest utility for the IQI is 0.015. In the Discussion we address the lowest values for existing generic utility systems (e.g., EQ-5D, HUI, QWB, 15D, SF-6D, AQol-8D). Some of these instruments allow utility values to drop below 0.0 (worse than dead). The fact that the IQI has a lowest utility just above the value for dead, shows that overall respondents (caregivers and people from the general population) considered a baby in a very bad health condition to have a health state that is (slightly) better than death. To reflect on the possible consequences of this in cost-effectiveness analysis we added the sentence below in the Discussion.

“… since reimbursement decisions will be made based on a broader range of criteria than just cost-effectiveness, we do not expect that having the lowest IQI value higher than 0 would be a disadvantage for the infant population.”

Reviewer #2

General comments to the authors

Thank you to the authors for providing me with the opportunity to review this novel and interesting quality of life research paper. The authors identified a substantial gap in the literature in their previous published work and are now developing the algorithm for the previously established descriptive system of the IQI to capture and assess proxy-reported outcomes for the derivation of health state utility valuations for infants aged 0-1 years.

I note that the underpinning theory regarding the development of the IQI’s descriptive/classification system by Krabbe et al’s authorship team in 2018 (https://doi.org/10.1371/journal.pone.0203276) where the views of the primary caregivers serve as the proxy and provide the response. This paper is the logical next step to that published work.

Overall, I consider that it would be useful to provide some additional background in the Introduction section of the manuscript regarding the development of the IQI’s descriptive/classification system, including the literature search and use of expert panels in this development. This would contextualise the reasons for this paper to the broader readership.

Thank you for this suggestion to position our paper in our broader scientific work. Please see our reply on this issue on the next page (Specific comments to the authors).

I also suggest that the Discussion section could be more circumspect about the development of this initial (albeit important) value set. Please include caveats regarding future confirmatory and clinical studies and perhaps refinement of the value set over time.

Clinical studies using the IQI are currently going on. Their results will contribute to populate the data set on which the value set was built and refine it over time. We have added a sentence on this point in the discussion.

Finally, I also suggest that the manuscript should be thoroughly checked for grammatical errors.

We have sent the manuscript to a professional language revision service before resubmitting it.

I recommend the paper for publication if my key concerns are appropriately addressed as outlined below.

Specific comments to the authors

Introduction section

Line 69: The introductory paragraph could perhaps mention full health economic evaluation and cost-utility analyses to contextualise the generation of utilities as a health economic input metric for cost-utility analyses (refer Drummond et al 2015).

Thank you for this suggestion. We have inserted a reference to the handbook of Drummond et al. after the second sentence of the Introduction.

Line 86: As noted in the general comments, please include some additional background regarding the development of the IQI’s descriptive system.

Thank you for this suggestion. We have added the following text in the Introduction:

“For the selection of the relevant domains (items) for the IQI, a multi-step development process began by extracting candidate health concepts from relevant measures that were identified by searching the literature. Next, panels, with experts from Asia, Europe, New Zealand, and the United States of America (USA), and two surveys, with primary caregivers in New Zealand, Singapore, and the United Kingdom (UK), evaluated the relevance of the candidate health concepts, organized them into attributes based on their similarities, explored alternative attributes, and generated response scales. Additional interviews assessed the cross-cultural interpretability, parents’ understanding of health attributes, and the usability of the mobile application.”

Line 93 Aims: The ‘aims’ paragraph could be improved by describing how you planned to achieve your key objective with some further detail and perhaps numbered sequentially.

Thanks, for this suggestion. We have altered the aim paragraph into:

“The present study aimed to explain how utilities for the IQI health states were generated with a novel two-step choice-based modeling procedure that helps locate the position of death on the IQI scale [14]. Specifically, this process involved the following two steps: 1) deriving values for a set of IQI health states from primary caregivers of infants based on a discrete choice modeling exercise, 2) normalizing the values obtained in Step 1 to an anchored 0.0–1.0 scale using utilities derived from responses collected from a general population sample.”

Methods section

Line 108: please provide further explanation regarding the ‘skin’ dimension being classified ‘qualitatively’. Is the ‘mood’ dimension perhaps phrased qualitatively too? Do you mean that there needs to be a more subjective assessment of the response?

Indeed, as also explained in one of our previous publications [Jabrayilov et al., 2019] about the IQI valuation study, for mood and skin no monotonously decreasing coefficients were observed, confirming the qualitative (i.e., no logical ordering) nature of these two items. However, this that does not imply a more subjective assessment of the response, they can be analyzed just as the other items.

Jabrayilov R, Vermeulen KM, Detzel P, Dainelli L, van Asselt ADI, Krabbe PFM. The Infant health-related Quality of life Instrument (IQI): Valuing health status in the first year of life. Value Health 2019;22(6):721-727.]

Line 110: is the term ‘parents’ = primary carers? Are there ‘primary carers’ of the infants who are not classified as parents? Please clarify.

The two terms (parents and primary caregivers) were used alternatively because no caregiver who was not a parent has been interviewed. However, for reasons of consistency, we think that is more correct to stick to the same terminology in order to avoid confusion: the word parent has been replaced with caregiver.

Line 219: Could you please relabel this “Recruitment of Participants (or Respondents)”. Please expand on this fundamental section: the recruitment strategy requires additional explanation. I note that a fulsome explanation was provided of the discrete choice design and the subsequent analyses. It is insufficient to say that participants were reached through a market research company. I need to understand the sampling and any bias. I also need to understand why participants from the USA were recruited only for the second study. Please also clarify in this section the meaning of the ‘main study’ and the ‘additional study’.

We have rephrased the label of the subheading into “Recruitment of respondents”. In this section we also added more information about the sampling strategy and made clear what the purpose was of the main study and the additional study. We added the text below in the manuscript:

“These countries were selected for practical reasons, since they are culturally different yet share one language, thus eliminating the need for translation at this phase, and enabling the analysis of possible cross-cultural differences in the results, and therefore improving generalizability. Clear instructions were given to all participants. While the instrument targets infants up to one year, in the survey we chose to include primary caregivers of 2- and 3-year-olds as well, to enable the recruitment of a larger sample. We assumed that the caregivers could recollect their experiences of the first year of their infant’s life quite easily. The general population sample included both parents (of children with varying ages) and respondents without children, to be as representative as possible. The latter were interviewed because they might think differently about the value of life in different health conditions.

In an additional study, a separate smaller sample was drawn in the USA from among members of the general population. This study was conducted to gain a better understanding of the severe IQI health states to explore and confirm our findings from the main study.”

Results section

Given the international nature of the cohort, stratified results should be reported beyond section 3.1 to highlight any cultural/country differences.

We have performed separate analyses for the three countries. In the new version of the paper, we have added a brief description of the results in the main text (lines 262-266, 299-304) and inserted 2 additional tables and 1 figure (S4 Table, S6 Table, S7 Fig) to present these.

Section 3.4: Please provide further contextualisation regarding the utility derived for ‘1111112’ and ‘1111111’. The reference point and the derivation of a utility score of 1.0 for 1111112 and 0.96 for 1111111 may require further clarification for the broader readership.

As stated in the methods section “After preliminary analyses to assess “interaction,” the reference category was changed to the second level, because the first level in this health item did not represent the best health condition; i.e., the second level had the highest coefficient. This implies that the first level “highly playful/highly interactive” would result in a lower score in the valuation than would the second level “playful/interactive.”

Then, in the results section where the utilities are derived, it says: ‘The best IQI state (utility score 1.000) was 1111112, since level 2 was the reference level for the interaction domain. State 1111111 had a utility score of 0.961’.

We assumed the matter was explained sufficiently in this way, but we propose to change the latter sentence into: ‘‘The best IQI state (utility score 1.000) was 1111112, since Level 2 was the reference level for the interaction domain, and therefore, 1111112 was considered to represent perfect health. State 1111111 had a utility score of 0.961”.

Discussion section

The Discussion is perhaps the weakest section of the paper and should be expanded to showcase the key and secondary findings of the paper, thoroughly outline all of the limitations of the study, and provide a robust conclusion. The first paragraph should provide an overall summary of the findings. One sentence is inadequate and this sentence describes what was done rather than a succinct summary of the key findings of the paper.

Thank you for this suggestion. We have rewritten the first paragraph, see below:

“In this study, we explained a novel two-step procedure to generate utilities for the IQI, relying on a sample comprising both primary caregivers and members of the general population. Values were derived for IQI health states based on responses from primary caregivers (e.g., proxies). Results indicated that out of the 7 IQI items, “breathing” had the highest impact on the HRQoL of infants. Moreover, except for “stooling,” all item levels were statistically significant. Subsequently, these values of the caregivers were normalized into utilities by using information on the location of death on the scale, derived from a general population sample. Findings revealed that none of the health states contained in the IQI was worse than death.”

The Discussion also does not outline the international nature of the cohort, nor issues surrounding cultural differences for the proxy-respondents as primary carers of infants and therefore the generation of potentially different value sets?

We have now added results per country in supplementary tables and reported about differences between countries in the main text of the results section. We also reflected on the cultural differences between countries in the discussion section. We do believe that eventually, country-specific value sets are warranted, but the primary aim of this study was to demonstrate the feasibility of the method, and not to generate local value sets.

Added to discussion:

“The country-specific results for the primary caregivers revealed a number of differences. Overall, the UK and USA appeared more alike, while China was slightly different. This could have its source in the different cultures of the countries in the sample, but given that sample sizes of around 400 were used for each country, a part of this finding may also be an artefact. In China, sleep appeared to be more important, whereas in the UK and USA, higher coefficients were observed for most other attributes, and more value was attached to breathing, mood, and interaction. Eventually, a larger sample should be used to determine final value sets based on country-specific preferences.”

In the second paragraph of the Discussion, perhaps remove the word ‘first’ – the authors do not then go to ‘second’, ‘third’ etc to expand on the key findings. The second paragraph could also reference the AQoL-8D multi-attribute utility instrument’s algorithmic range as an example of an instrument that does not record a utility value that is less than zero.

Thank you for noticing this. We have dropped the word “First”. In addition, we have inserted a new reference to the Australian AQol-8D instrument [Richardson, 2014]. In this study, based on another study sample than the study presented in Culyer’s Encyclopedia, Figure 3 is indeed showing that the lowest AQol-8D health states is approximately 0.25.

Richardson, J, Sinha K, Iezzi A, Khan, MA. Modelling utility weights for the Assessment of Quality of Life (AQoL)-8D. Quality of Life Research. 2014;23:2395-2404.

In the third paragraph, I would remove the statement that “It is likely that states worse than death are less self-evident than generally thought and that the lowest utility for the EQ-5D-3L may have been an accidental finding”. Perhaps the suggestion could be underpinned by statements about the instruments only 243 health states and that the instrument is relatively insensitive to complex and chronic disease states. Please use appropriate referencing for this statement.

We have dropped this sentence.

The fourth paragraph could be tightened to provide some additional explanation and contextualisation regarding the underpinning model and the advantages of the model. This paragraph in its current form is somewhat jumbled.

Thank you for this suggestion. We have rewritten this paragraph (Discussion: lines 351-357).

Limitations section. Please be more circumspect in this section – one limitation only is outlined. The recruitment/sampling strategy is not properly described therefore I cannot comment on limitations regarding the sampling strategy. Similarly, stratified results are not presented I would expect that cultural differences would be evident in stratified results. Stratified results could also be the subject of further discussion.

Another limitation was that detailed characteristics of the respondents were not available, apart from country, age and sex. Therefore, we cannot say for instance which part of the general population sample were also parents, or perform stratified analyses based on socioeconomic status. The stratification of the results per country indeed revealed a number of differences. However, it should be kept in mind that the present study was largely intended as a proof of principle to demonstrate how the process of generating normalized values (utilities) can take place and to provide a first value set for the IQI. From that perspective, the generalizability of the results is of less importance at this stage. We added sentences about this in the Discussion (lines 371-377).

“Another limitation was that detailed characteristics of the respondents were not available, apart from country, age, and sex. Therefore, we were not able to, for instance, say which part of the general population sample comprised parents, nor could we perform stratified analyses based on socioeconomic status. It should be kept in mind that the present study was largely intended as a proof of principle to demonstrate how the process of generating normalized values (utilities) can take place and to provide a first value set for the IQI. From that perspective, the generalizability of the results is of less importance at this stage.”

The conclusion should provide additional explanation regarding future research and confirmatory studies.

Clinical studies using the IQI are currently going on. In some of those, another health instrument is used in parallel, in order to double check the IQI validity (e.g. results going in the same direction: subjects improving their health states confirmed by both instruments). Moreover, since the IQI is a “living” instrument, these results will contribute to populate the data set on which the value set was built and refine it over time. We have added a sentence on this point in the discussion. 

Reviewer #3

The sample consisted of members of the general population (n=1409) and infant caregivers (n=1229). Were the members of the general population parents? If not, is there reason to believe that the sample of the general population would answer the questions significantly differently than the infant caregivers?

The general population sample did contain parents, as we aimed it to be representative of the general population. And yes, we feel there is reason to believe that this sample may have different preferences than the primary caregivers. We added the following to the methods section (2.2 recruitment):

“The general population sample included both parents (of children with varying ages) and respondents without children, to be as representative as possible. The latter were interviewed because they might think differently about the value of life in different health conditions.”

I’m assuming from some of the information that follows in the manuscript, that yes, the subsamples answer some questions differently, but in section 3.2 “states worse than death”, the authors write, “Among the health states in the main study, the one most frequently mentioned as being worse than death was 4241241 (2.2%) for the general population and 4244231 (2.4%) for the primary caregivers.”

We indeed assumed there would be differences, also given previous (from literature) difficulties with estimating utilities for children as parents were not willing to trade any life years no matter how bad the health state. In the end, as seen by our data, they turn out not be so different, which is still an interesting finding to report.

The digits are poorly explained in section 2.1 instrument. A slightly more detailed explanation would be useful for the reader earlier in the paper (perhaps expand the example given in line 113, “e.g., 3231421 equates to moderately affected sleep, slight feeding problems, moderate breathing problems, sleeps well, inconsolable crying, dry or red skin, highly playful/highly interactive.” Because as it is, I’m not even certain that I have interpreted this correctly. Is this correct?

We agree to the reviewer’s suggestion and have expanded the example to include the explanation of the levels (which was indeed correctly listed by the reviewer, apart from the 4th attribute which was not ‘sleeps well’ but ‘normal stool/poo’ as sleeping is the 1st attribute), as follows: “… e.g., 3231421, which would equate “moderately affected sleep, slight feeding problems, moderate breathing problems, normal stool/poo, inconsolable crying, dry or red skin, highly playful/highly interactive”.

Figure 2 is confusing because at the top it says, “Suppose that an infant’s first year of life is spent mainly in either State A or State B and that its health is uncertain afterwards.” At the bottom of the figure, “Please indicate if you would consider health state A and B as better or worse than death.” The use of the word “and” at the bottom leads me to believe that the infant has all the conditions in both A and B. Is this because there are 2 options to answer at the bottom?

The health status A and B are two different alternatives, as stressed by EITHER and OR in the initial sentence. An infant might have experienced one OR the other, not both of them together. After this first question, no matter which health status the infant has experienced (A or B, we do not know and it does not matter), in the second question (note that these are two separate/independent questions) we are asking to evaluate whether A AND B, each of them separately, are better or worse than death according to the interviewed person. Given results from the survey, most respondents seemed to have understood this well.

“The main limitation associated with this study is that no obvious state worse than death was found.” It seems that the descriptions “Severely disturbed sleep, severe feeding problems, severe breathing problems, severe stool problems, inconsolable crying, bleeding or cracked skin, low-energy/inactive/dull” do not describe the pain of the infant and so do not elicit a response from an adult that there is no obvious state worse than death. Is it possible that the adults who took this survey view the 4’s as problems that occur occasionally and do not see these issues as worse than death? Adults can forget that infants are altricial, making the 4’s much more severe than they are for adults.

This is a rather academic discussion: many of these thoughts are also applicable to generic utility instruments, which rarely specify a time-period when asking to evaluate a health state. The interviewed person can therefore imagine anything about the length of that condition. In the same way, we do not know how the adults have interpreted the survey. What we do know (as it was specified in the question) is that these infants are experiencing that specific health condition in the first year of life and what happens later in time is uncertain.

How do you expect the results would vary if you included an outcome statement after each option (e.g., “severe stool/poo problems, resulting in hospitalization”?

Our way of reasoning was that, by describing a severe health state of an infant, caregivers would realize the potential implications of such a health state. Adding more information about the consequences, such as hospitalization or not, might indeed affect the preferences for one health state over another, but this was not the purpose of the exercise.

Attachment

Submitted filename: Rebuttal IQI utilities.docx

Decision Letter 1

Jing Tian

11 Mar 2020

A two-step procedure to generate utilities for the Infant health-related Quality of life Instrument (IQI)

PONE-D-19-19703R1

Dear Dr. Krabbe,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Jing Tian

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: Dear authors, thank you for responding to all of my comments and suggestions in a thorough and robust manner. I now recommend the article for publication. Good luck with your publication and future endeavours with this interesting work. With kindest regards.

Reviewer #3: The authors were careful to respond to all my inquiries and addressed them in the manuscript when necessary for clarification.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: Yes: Dr Julie A. Campbell

Reviewer #3: Yes: Julie Campbell

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Additional study (Part 1) in which the 7 worst IQI health states (apart from 4444444) were assessed against each other.

    (EPS)

    S2 Fig. Additional study (Part 2) in which the 7 worst IQI health states (apart from 4444444) were assessed against death.

    (EPS)

    S3 Fig. Relationship between the values for the primary caregivers (bottom) and the normalized values for the general population (top).

    (EPS)

    S4 Fig. Normalized values (utilities) per country (same X- and Y-axis).

    (EPS)

    S1 Table. Parameter estimates for the levels of the 7 IQI health items separately for the general population and primary caregivers.

    (DOCX)

    S2 Table. Parameter estimates (values) for the levels of the 7 IQI health items for the primary caregivers, per country.

    (DOCX)

    S3 Table. Parameter estimates (normalized scale = utilities) for the levels of the 7 IQI health items for the primary caregivers, per country.

    (DOCX)

    Attachment

    Submitted filename: Rebuttal IQI utilities.docx

    Data Availability Statement

    All relevant data are within the paper and its Supporting Information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES