Skip to main content
PLOS Global Public Health logoLink to PLOS Global Public Health
. 2022 Feb 2;2(2):e0000141. doi: 10.1371/journal.pgph.0000141

Differences and agreement between two portable hand-held spirometers across diverse community-based populations in the Prospective Urban Rural Epidemiology (PURE) study

MyLinh Duong 1,*, Sumathy Rangarajan 1, Michele Zaman 1, Nafiza Mat Nasir 2, Pamela Seron 3, Karen Yeates 4, Afzalhussein M Yusufali 5, Rasha Khatib 6, Lap Ah Tse 7, Chuangshi Wang 8, Andreas Wielgosz 9, Koon Teo 1, Rajesh Kumar 10, Alvaro Avezum 11, Rosnah Ismail 12, Burcu Tumerdem çalık 13, Soumya Gopakumar 14, Omar Rahman 15, Katarzyna Zatońska 16, Annika Rosengren 17, Johanna Otero 18, Roya Kelishadi 19, Rafael Diaz 20, Thandi Puoane 21, Salim Yusuf 1
Editor: Andre F S Amaral22
PMCID: PMC10021326  PMID: 36962310

Abstract

Introduction

Portable spirometers are commonly used in longitudinal epidemiological studies to measure and track the forced expiratory volume in first second (FEV1) and forced vital capacity (FVC). During the course of the study, it may be necessary to replace spirometers with a different model. This raise questions regarding the comparability of measurements from different devices. We examined the correlation, mean differences and agreement between two different spirometers, across diverse populations and different participant characteristics.

Methods

From June 2015 to Jan 2018, a total of 4,603 adults were enrolled from 628 communities in 18 countries and 7 regions of the world. Each participant performed concurrent measurements from the MicroGP and EasyOne spirometer. Measurements were compared by the intra-class correlation coefficient (ICC) and Bland-Altman method.

Results

Approximately 65% of the participants achieved clinically acceptable quality measurements. Overall correlations between paired FEV1 (ICC 0.88 [95% CI 0.87, 0.88]) and FVC (ICC 0.84 [0.83, 0.85]) were high. Mean differences between paired FEV1 (-0.038 L [-0.053, -0.023]) and FVC (0.033 L [0.012, 0.054]) were small. The 95% limits of agreement were wide but unbiased (FEV1 984, -1060; FVC 1460, -1394). Similar findings were observed across regions. The source of variation between spirometers was mainly at the participant level. Older age, higher body mass index, tobacco smoking and known COPD/asthma did not adversely impact on the inter-device variability. Furthermore, there were small and acceptable mean differences between paired FEV1 and FVC z-scores using the Global Lung Initiative normative values, suggesting minimal impact on lung function interpretation.

Conclusions

In this multicenter, diverse community-based cohort study, measurements from two portable spirometers provided good correlation, small and unbiased differences between measurements. These data support their interchangeable use across diverse populations to provide accurate trends in serial lung function measurements in epidemiological studies.

Introduction

Lung function assessments are now more accessible with the wide adoption of handheld portable spirometers in the community and ambulatory care setting. These devices are easy to operate and many have inbuilt quality check software to enable high quality measurements. They are also commonly employed in research studies to provide rapid and reliable lung function measurements and tracking of lung function over time [1]. However, in large multicenter trials, it is common to have different portable spirometers across different study sites depending on the local availability of these devices; and it is often necessary to replace older devices with newer models over time [2]. This raise questions regarding the reliability and agreement between measurements obtained from different devices. Therefore, it is important to ascertain the reliability, differences and agreement between different spirometers; and identify factors that may contribute to the variability between spirometers.

To date, there have been few small studies, which examined the variability between different portable spirometers [211]. Many were conducted in highly selected healthy young individuals in laboratory setting. Only a few were conducted in the community but limited to one population (generally from Europe or North America). It is unclear whether these findings can be generalized to other populations with different anthropometrics, demographics and underlying disease prevalence. Furthermore, not much is known on the source of variability between spirometers.

The Prospective Urban Rural Epidemiology (PURE) study is an international prospective cohort study, comprising of adults recruited from urban and rural communities from high, middle and low-income countries. Baseline spirometry data was collected with a handheld portable turbine spirometer without flow volume loops (FVL). In the course of cohort follow-up, a new portable ultrasonic spirometer was introduced, which provided FVL. In the present study, we examined the correlation, agreement and mean difference between measurements from the old and new spirometer, in an unselected sub-sample. We also assessed whether the correlation and agreement between spirometers may differ across diverse populations from different socioeconomic and geographic regions. Lastly, we examined the impact of utilizing two different spirometers on the interpretation of spirometry measurements, using the Global Lung Initiative (GLI) normative values. Our findings will address some of the challenges associated with the widespread use of portable spirometers and their role in providing access to lung function measurements in the community. This information will facilitate correct interpretation of data and offer insight into how best to address the variability between spirometers.

Methods

The PURE study began recruitment in 2004 of community-based adults aged 35 to 70 years old; from 628 urban and rural communities across 18 high-, middle- and low-income countries. The study design and methodology have been described elsewhere [12]. In brief, standardized approaches were used for the enumeration of households, identification of participants, recruitment and data collection. As it was not feasible to collect data from a representative sample of each country, the sampling method used for each country aimed to reduce participation bias based on local risk factors and disease prevalence. Baseline data were collected between 2004–2009 and follow-up occurred every 3 years. The study is coordinated by the Population Health Research Institute, Hamilton Health Sciences, McMaster University (Hamilton, ON, Canada). Ethics approval was provided by the Hamilton Health Sciences Research Ethics Board and the research ethics committees of the other participating centers (Appendix I in S1 File). All participants provided written informed consent to participate in the study.

Baseline spirometry was measured with the MicroGP spirometer (MicroMedical, Chatham. IL, USA), without FVL, following the 2005 American Thoracic Society/European Respiratory Society (ATS/ERS) spirometry standardization guidelines [13]. The MicroGP spirometer contains a turbine, which generates rotational flow during the spirometry maneuver. The rotation of the low-inertia vane is converted into electrical impulses by means of an infrared light-emitting diode and a photodiode sensor. A microprocessor within the device converts the electrical pulses into spirometry measurements, which are displayed digitally. According to the manufacturer, the microGP has an accuracy of ±2%. In 2015, the EasyOne (Ndd, Medical Technologies, Inc., Switzerland) ultrasonic spirometer was introduced, which provided automated quality checks, messaging, quality grades and FVL. The quality grades after each test session provided by the EasyOne include: (1) Grades A or B for three acceptable efforts and <100 ml (Grade A) or <150 ml (Grade B) variability between the two highest FEV1 and FVC; (2) Grade C for two or more acceptable efforts and <200 ml variability; (3) Grade D for one acceptable effort or highly variable efforts > = 200 ml; and (4) Grade F for no acceptable efforts. The EasyOne spirometer uses an ultrasonic sensor to measure airflow. It has no moving parts and its accuracy is not dependent on mechanical function or the measurement of pressure or volume displacement. Accordingly, the manufacturer information report an accuracy <3%, which is maintained throughout its operational life and not needing regular calibration.

All study visits were conducted in dedicated research clinics in the community for all sites and countries. Participants were coached by a trained staff, prior to performing pre-bronchodilator forced inspiratory and expiratory manoeuvers (up to six attempts). All tests were performed in a standing position with participants’ back straight and wearing a nose-clip. With the introduction of the EasyOne spirometer, each center enrolled the first five consecutive participants from each community into the present substudy. Each participant provided spirometry measurements using the two devices in a random order within 3 hours supervised by the same research staff. The order of spirometer measurements was randomly generated by the coordinating site and issued to the center prior to the day of testing. Spirometers were calibrated monthly (or as needed in extreme weather or handling) using a 3L syringe to ensure an accuracy <105 ml or 3.5%.

Statistical analysis

Means and frequency statistics were used to describe the data. The highest FEV1 and FVC from each spirometer were analyzed. The assumption of normality and constant variance of the FEV1 and FVC were assessed by visual inspection of histograms and plots of residuals against fitted values. The correlation and agreement between spirometers were assessed with scatterplots, intra-class correlation coefficients (ICC) and Bland-Altman plots [14]. Mean differences between paired FEV1 and paired FVC were calculated as absolute (EasyOne–MicroGP) and relative ([EasyOne-MicroGP)/ average]*100) differences between spirometers. The random-intercept multilevel ‘null’ model was used to estimate the source (region, country, center and participant level) of variation between spirometers. Stratified analyses by region, sex, age, body mass index (BMI), smoking status, known COPD or asthma, education level and quality grades were performed to explore the effect of each factor on the reliability and agreement between spirometers. Countries were classified into seven regions according to geographic location and socioeconomic level (by the World Bank classification) [15]. To examine the impact on interpretation, the GLI normative values were used to transform FEV1 and FVC into z-scores prior to Bland-Altman analysis [16]. We used the ATS/ERS recommendation for between-effort repeatability within test session of <150 ml to assess whether mean differences between spirometers met the criterion for within test reproducibility [17]. Similarly, a difference in z-score <0.5 SD was regarded as not meaningful difference between age, sex, height and ethnicity GLI adjusted values [18]. All analyses were performed using SAS version 9.4 (The SAS Institute, Cary, NC, USA) and STATA 15 (StataCorp LLC, Texas, USA).

Results

A total of 4,603 participants from 628 communities in 18 countries across 7 regions completed measurements from the two spirometers. Baseline characteristics of included participants are shown in Table 1. Similar to the larger PURE study (Appendix II in S1 File), there were more females and individuals between the ages of 50–65 years. The overall proportion of participants meeting quality grades A, B or C on the EasyOne device was 65%, which is similar to the larger PURE study. There was a trend for higher prevalence of comorbidities including COPD/asthma and cardiac diseases; and lower education level in the substudy.

Table 1. Baseline characteristics by region.

Overall S Asia China S East Asia Africa Middle East S America N Am/Eur
N (total) 4603 566 (12.3) 631 (13.7) 191 (4.1) 368 (8) 433 (9.4) 1,578 (34.3) 836 (18.2)
Females 2,840 (61.7) 335 (59.2) 351 (55.6) 101 (52.9) 257 (69.8) 306 (70.7) 1,044 (66.2) 446 (53.3)
Urban 2,363 (51.4) 332 (58.8) 119 (18.9) 54 (28.3) 165 (44.8) 243 (56.1) 854 (54.1) 596 (71.3)
Age, years 49.6 ± 9.2 46.2 ± 9.2 50.0 ± 8.5 52.9 ± 8.9 49.2 ± 9.2 47.0 ± 9.2 50.3 ± 9.1 52.2 ± 9.0
Weight, (kg) 70.2 ± 15.9 61.2 ± 12.6 64.7 ± 11.6 64.9 ± 13.5 67.3 ± 18.4 78.8 ± 15.9 70.3 ± 14.5 78.4 ± 16.8
BMI (kg/m2) 27 ± 5.5 24.8 ± 4.8 25.1 ± 3.8 26.1 ± 4.9 26.5 ± 7.4 30.6 ± 5.9 27.7 ± 5.1 28.1 ± 5.5
Height (cm) 160.6 (9.6) 156.9 ± 8.9 160.3 ± 8.4 157.5 ± 9.2 159.8 ± 8.5 160.5 ± 8.7 159.3 ± 9.2 166.8 ± 9.9
COPD/asthma 293 (6.4) 19 (3.4%) 36 (5.7) 11 (5.8%) 11 (3.0) 36 (8.3) 72 (4.6) 108 (12.9)
Cardiac disease 253 (5.5) 33 (5.8) 55 (8.7) 5 (2.6) 11 (3) 31 (7.2) 66 (4.2) 52 (6.2)
Strokes 110 (2.4) 16 (2.8) 33 (5.2) 4 (2.1) 3 (0.8) 6 (1.4) 30 (1.9) 18 (2.1)
Smoking status
• current 628 (13.7) 88 (15.6) 119 (19.1) 25 (13.1) 107 (29.1) 40 (9.2) 155 (9.8) 94 (11.3)
• ex-smokers 954 (20.8) 27 (4.8) 145 (23.2) 27 (14.1) 20 (5.4) 26 (6) 429 (27.2) 280 (33.5)
• never 3,013 (65.5) 451 (79.6) 360 (57.7) 139 (72.8) 241 (65.5) 367 (84.8) 994 (63) 461 (55.2)
Education
primary/below 2,368 (51.7) 314 (56) 322 (51.4) 87 (45.5) 280 (77.8) 197 (45.5) 1027 (65.2) 141 (16.9)
secondary/above 2,213 (48.3) 247 (44) 305 (48.6) 104 (54.5) 80 (22.2) 236 (54.5) 549 (34.8) 692 (83.1)
FEV1, L, Micro 2.21 ± 0.8 1.76 ± 0.72 2.10 ± 0.65 1.90 ± 0.72 1.87 ± 0.67 2.26 ± 0.78 2.24 ± 0.79 2.74 ± 0.77
z score, Micro -1.06 ± 1.74 -1.52 ± 1.56 -1.46 ± 1.62 -1.52 ± 1.56 -1.05 ± 1.67 -1.36 ± 1.58 -0.82 ± 1.86 -0.42 ± 1.30
FEV1, L Easy 2.18 ± 0.7 1.71 ± 0.56 2.11 ± 0.65 1.84 ± 0.60 1.80 ± 0.61 2.23 ± 0.69 2.20 ± 0.72 2.68 ± 0.74
z score, Easy -1.18 ± 1.45 -1.54 ± 1.14 -1.31 ± 1.59 -1.54 ± 1.14 -1.18 ± 1.48 -1.58 ± 1.32 -1.08 ± 1.49 0.53 ± 1.16
FVC, L, Micro 2.71 ± 1.1 1.98 ± 0.95 2.65 ± 0.85 2.22 ± 0.79 2.38 ± 0.96 2.90 ± 1.13 2.69 ± 1.02 3.46 ± 1.01
z score, Micro -1.21 ± 1.85 -1.68 ± 1.63 -1.52 ± 1.83 -1.69 ± 1.63 -1.04 ± 1.80 -1.32 ± 1.94 -1.22 ± 1.89 -0.38 ± 1.27
FVC, L Easy 2.81 ± 0.9 2.17 ± 0.70 2.74 ± 0.81 2.29 ± 0.70 2.41 ± 0.73 2.73 ± 0.83 2.83 ± 0.85 3.44 ± 0.92
z score Easy -1.13 ± 1.54 -1.79 ± (1.25 -1.35 ± 1.79 -1.79 ± 1.24 -0.92 ± 1.48 -1.81 ± 1.31 -0.91 ± 1.47 -0.49 ± 1.08
Grades A-B 2171 (47.2) 223 (39.5) 392 (62.1) 92 (48.1) 152 (41.3) 214 (49.4) 548 (34.7) 551 (65.9)
Grade C 783 (17) 92 (16.3) 100 (15.8) 38 (20) 71 (19.3) 60 (13.9) 309 (19.6) 113 (13.5)
Grade D 1170 (25.4) 147 (26) 123 (19.5) 43 (22.5) 95 (25.8) 102 (23.6) 534 (33.9) 126 (15.1)
Grade F 457 (9.9) 96 (17) 14 (2.2) 18 (9.4) 50 (13.6) 56 (12.9) 185 (11.7) 38 (4.5)

Variables are presented as means ±SD for continuous data and absolute numbers (% of total in each region/ column). Abbreviations: BMI = body mass index calculated as weight divided by height squared; COPD (chronic obstructive pulmonary disease)/asthma, CHF (congestive heart failure) and strokes were self-reported; FEV1 = forced expiratory volume in the first second measured in liters (L); FVC = forced vital capacity in liters (L); z-scores were estimated using the Global Lung Function Initiative normative values appropriate for age, sex, height and ethnicity; Micro = microGP spirometer; Easy = EasyOne spirometer. For regions S = South; N Am/Eur = North America/Europe. The grades were quality grades using ATS guideline provided by the EasyOne spirometer.

The correlations, mean differences and limits of agreement (LoA) between paired FEV1 and FVC by region are shown in Table 2. Overall, paired FEV1 and FVC between spirometers were highly correlated (Fig 1). The overall mean differences between spirometers, whether in absolute volume or as a percentage of mean FEV1 or FVC were small and within acceptable limits of between-effort reproducibility (Fig 2). The 95% LoA between paired measurements were wide and showed no association with the size of FEV1 or FVC. Correlations between paired FEV1 and FVC were similarly high across regions except for South Asia, where there were low to moderate strength of correlation (Table 2). For South America and the Middle East, the correlation between paired FVC were lower than FEV1. Across regions, the mean differences between paired FEV1 were small (range from absolute -83ml [relative difference -4%] to 49ml [2.5%]) and showed no consistent bias across regions. The mean differences between paired FVC were larger, particularly for the Middle East (-203 ml [-6%]) and South America (141 ml [6.3%]) and again showed no consistent bias across regions. The 95% LoA were wide for both FEV1 and FVC; suggesting large variation in agreement between spirometers across regions.

Table 2. Correlations, mean differences and agreement between spirometers by region.

  OVERALL S Asia China SE Asia Africa Middle East S America N Am / Eur
N 4603 566 631 191 368 433 1578 836
CORRELATIONS (ICC, 95%CI)
FEV 1 0.88 (0.87, 0.88) 0.67 (0.61, 0.72) 0.93 (0.91, 0.94) 0.89 (0.85, 0.92) 0.83 (0.79, 0.86) 0.83 (0.80, 0.86) 0.82 (0.80, 0.94) 0.94 (0.93, 0.95)
FVC 0.84 (0.83, 0.85) 0.53 (0.45, 0.61) 0.92 (0.91, 0.93) 0.88 (0.85, 0.91) 0.78 (0.73, 0.82) 0.71 (0.64, 0.76) 0.74 (0.71, 0.77) 0.94 (0.93, 0.95)
BLAND-ALTMAN ANALYSIS- mean differences (95% CI) and 95% limits of agreement (95% CI)
FEV 1 , ml -38 (-53, -23) -9 (-64, 47) 49 (22, 76) -3 (-47, 41) -37 (-88, 14) -83 (-135, -31) -74 (-104, -45) -42 (-68, -17)
Upper LoA 984 (958, 1010) 1300 (1205, 1394) 725 (678, 771) 1194 (1119, 1269) 938 (851, 1026) 984 (895, 1072) 1082 (1032, 1132) 696 (653, 740)
Lower LoA -1060 (-1085, -1034) -1317 (-1412, -1222) -626 (-673, -580) -1200 (-1275, -1125) -1012 (-1099, -924) -1150 (-1239, -1062) -1231 (-1281, 1181) -781 (-825, -737)
FVC, ml 33 (12, 54) 38 (-40, 117) 51 (18, 84) 19 (-42, 79) 60 (-10, 131) -203 (-284, -122) 141 (99, 182) -59 (-89, -30)
Upper LoA 1460 (1424, 1496) 1851 (1717, 1984) 887 (830, 944) 1655 (1551, 1759) 1405 (1285, 1526) 1476 (1337, 1615) 1781 (1712, 1852) 797 (746, 847)
Lower LoA -1394 (-1430, -1358) -1774 (-1908, -1640) -785 (-842, -728) -1618 (-1723, -1514) -1285 (-1405, -1164) -1882 (-2021, -1743) -1500 (-1571, 1429) -915 (-966, -865)
 FEV, % -1.5% (-2.3, -0.8) -1% (-2.1, 4.3) 2.5% (1.1, 4) 1.2% (1.3, 3.7) -2.2% (-4.9, 0.4) -4% (-6.5, -1.6) -3.7% (-5.2, -2.3) 1.3% (-2.5, -0.1)
Upper LoA 51 (50,52) 76 (70, 81) 39 (37, 42) 69 (65, 73) 48 (44, 53) 47 (43, 51) 54 (51, 56) 32 (30, 34)
Lower LoA -54 (-55, -53) -73 (-79, -68) -34 (-37, -32) -67 (-71, -63) -53 (-57, -48) -55 (-59, -51) -61 (-64, -98) -34 (-36, -34)
FVC, % 2.3% (1.5, 3.1) -3.3% (-0.1, 6.7) 2.3% (0.9, 3.7) 2.2% (-0.4, 4.9) 3.6% (0.7, 6.5) -6% (-9, -3) 6.3% (5, 8) -1.6% (-2.4, -0.7)
Upper LoA 56 (55, 58) 83 (77, 89) 37 (35, 40) 74 (69, 78) 60 (55, 67) 54 (49, 59) 65 (62, 67) 24 (22, 25)
Lower LoA -52 (-53, -50) -76 (-82, -70) -33 (-35, -30) -69 (-74, -65) -52 (-57, -47) -66 (-71, -61) -52 (-55, -50) -27 (-28, -25)

N = number of included participants within each region. The mean absolute difference (EasyOne value minus microGP value) or mean relative difference (EasyOne minus microGP)/average)*100) between spirometers are provided with 95% CI. The 95% upper and lower limits of agreement (LoA) and their 95% CI are provided. For regions S = South; SE = South East; N Am/Eur = North America/Europe.

Fig 1. Overall correlation between paired FEV1 and FVC from the microGP and EasyOne spirometers.

Fig 1

ICC = intraclass correlation coefficient and 95% CI for paired FEV1 (L) and FVC (L) measured within 3 hours and conducted in a random order. All measurements were supervised by the same trained study coordinator. The Line of identity is provide representing a perfect correlation between paired measurements.

Fig 2. Bland-Altman plots for paired FEV1 and FVC measured by the microGP abd EasyOne spirometers.

Fig 2

Differences between paired FEV1 and FVC were calculated as the absolute mean difference (EasyOne minus microGP) in Panels And B; or the relative mean difference (EasyOne minus microGP)/average * 100) in Panels C and D; plotted against the average ((EasyOne+microGP) 2) on the x-axis. The 95% Limits of Agreement (LoA) are provided (blur lines). The 95% CI for the mean differences and LoA are also provide (broken Line).

To understand the source of variation between spirometers, the ICC and variance components between spirometers were assessed at the region, country, center and participant levels (Table 3). The highest ICC between paired FEV1 and FVC were observed at the participant level, indicating the measurements between spirometers were highly correlated within individuals. This correspond to the largest variance component, suggesting that participant factors contributed significantly to the variation between spirometers. The correlation and variance between spirometers at the region, country and center levels were substantially less, suggesting these levels contribute substantially less to the variation between spirometers. Furthermore, the increase in size of the ICC and variance component from the region to country and center levels were not dramatic, compared to the large increase from center to participant levels. This further highlights the importance of participant factors in contributing to the variation between spirometers.

Table 3. ICC and variance estimates between spirometer at the region, country, centers and individuals levels.

FEV1 FVC
Levels ICC (95% CI) Variance (95%CI), Liters2 ICC (95% CI) Variance (95%CI), Liters2
Region 0.133 (0.039, 0.361) 0.084 (0.023, 0.306) 0.146 (0.042, 0.399) 0.148 (0.039, 0.559)
Country within region 0.168 (0.069, 0.356) 0.023 (0.003, 0.155) 0.202 (0.087, 0.402) 0.056 (0.011, 0.291)
Centers within countries 0.208 (0.104, 0.373) 0.025 (0.009, 0.012) 0.248 (0.129, 0.422) 0.046 (0.022, 0.096)
Individuals within centers 0.783 (0.743, 0.819) 0.366 (0.348, 0.384) 0.730 (0.673, 0.781) 0.488 (0.463, 0.515)
Residual error 0.138 (0.132, 0.143) 0.273 (0.262, 0.284)

ICC = intraclass coefficient.

To explore the participant factors that may contribute to the variation between spirometers, we examined the baseline characteristics of participants, whose inter-device difference were within and outside the 95% LoA for the overall population (Appendix III in S1 File). The distribution in age, body mass index and sex were similar between these 2 groups. Furthermore, COPD/asthma, cardiac disease, strokes and tobacco smoking did not adversely impact the agreement between spirometers. However, there were higher percentages of lower quality grade spirometry and lower education level in those outside the 95% LoA. Separate stratified analyses were conducted to further explore the effects of sex, age, BMI, smoking status, known COPD or asthma, education and quality grades on spirometer variability (Table 4, Appendix IV in S1 File). The correlation between paired FEV1 and FVC were generally high and similar across strata. The mean differences between spirometers were small with minimal variation across strata, even for the lower quality grades. However, there were lower correlation, larger variability and larger LoA between spirometers among those with lower education level and lower quality grades.

Table 4. Stratified analyses by demographic, anthropometric, clinical characteristics and quality grades.

SEX N Correlation mean Diff, ml LoA, ml mean Diff, % LoA, %
FEV1 Females 2,832 0.80 (0.78, 0.82) -60 (-78, -42) -1017, 897 -3 (-4, -2) -54, 49
Males 1,756 0.87 (0.86, 0.89) -2 (-28, 25) -1117, 1114 0.7 (-0.6, 2) -52, 54
FVC Females 2,827 0.70 (0.68, 0.72) -9 (-35, 19) -1455, 1438 1.0 (-0.09, 2) -56, 58
Males 1,758 0.86 (0.84, 0.87) 101 (68, 134) -1284, 1486 4.4 (3.2, 5.6) -45, 54
AGE N Correlation mean Diff, ml LoA, ml mean Diff, % LoA, %
FEV1 <50 y 950 0.88 (0.86, 0.89) -12 (-46, 22) -1071, 1047 0.2 (-1.5, 1.9) -51, 51
50–65 y 2494 0.87 (0.86, 0.88) -31 (-52, -10) -1058, 996 -1.3 (-2.3, -0.2) -53, 50
>65 y 1147 0.87 (0.85, 0.88) -74 (-103, -45) -1049, 901 -3.6 (-5, -2) -58, 51
FVC <50 y 793 0.84 (0.82, 0.86) 56 (8, 105) -1431, 1544 3.4 (1.7, 5) -48, 55
50–65 y 2483 0.84 (0.82, 0.85) 37 (9, 65) -1363, 1437 2.3 (1.3, 3.4) -50, 55
>65 y 1,094 0.82 (0.80, 0.84) 7 (-36, 49) -1427, 1441 1.3 (-0.5, 3) -58, 60
BMI N Correlation mean Diff, ml LoA, ml mean Diff, % LoA, %
FEV1 <18 100 0.80 (0.70, 0.86) -119 (-246, 8.6) -1374, 1137 -8 (-15, -0.6) -78, 63
18–25 1588 0.88 (0.87, 0.89) -20 (-45, 5) -1024, 984 -0.7 (-2, 0.6) -53, 51
>25 2883 0.87 (0.86, 0.88) -43 (-62, -24) -1060, 975 -1.6 (-2.6, -0.7) -53, 50
FVC <18 99 0.71 (0.57, 0.80) -98 (-291, 96) -2000, 1805 -2 (-9, 6) -77, 75
18–25 1577 0.85 (0.83, 0.86) 76 (41, 111) -1326, 1478 3.7 (2.4, 5) -51, 58
>25 2877 0.84 (0.83, 0.85) 17 (-9, 43) -1397, 1431 1.7 (0.8, 2.7) -51, 54
SMOKING N Correlation mean Diff, ml LoA, ml mean Diff, % LoA, %
FEV1 Never 2,998 0.86 (0.85, 0.87) -34 (-53, -14) -1101, 1033 -1.3 (-2.3, -0.3) -56, 53
Ever 1582 0.90 (0.89, 0.91) -46 (-70, -23) -979, 886 -2 (-3, -0.8) -50, 46
FVC Never 2984 0.82 (0.81, 0.83) 11 (-17, 38) -1478, 1497 2 (0.6, 2.7) -55, 59
Ever 1576 0.86 (0.85, 0.88) 75 (42, 108) -1230, 1380 3 (2, 5) -45, 52
COPD/ ASTHMA N Correlation mean Diff, ml LoA, ml mean Diff, % LoA, %
FEV1 No 4295 0.87 (0.86, 0.88) -38 (-53, -22) -1073, 998 -1.4 (-2.2, -0.6) -54, 52
Yes 293 0.93 (0.92, 0.95) -41 (-88, 5) -832, 749 -3.4 (-5.8, -0.9) -46, 39
FVC No 4275 0.83 (0.82, 0.84) 37 (15, 59) -1405, 1480 2.5 (1.6, 3.3) -52, 57
Yes 293 0.90 (0.87, 0.92) - 23 (-92, 46) -1201, 1154 -0.3 (-3, 2.5) -48, 48
EDUCATION N Correlation mean Diff, ml LoA, ml mean Diff, % LoA, %
FEV1 Low 2,362 0.82 (0.80, 0.83) -68 (-92, -46) -1189, 1052 -3.3 (-4.4, -2.1) -61, 54
high 2,207 0.91 (0.90, 0.92) -5 (-24, 14) -904, 894 0.3 (-0.7, 1.2) -46, 47
FVC low 2,360 0.76 (0.74, 0.77) 18 (-15, 51) -1566, 1602 1.6 (0.3, 2.9) -61, 64
high 2,205 0.88 (0.87, 0.89) 49 (22, 77) -1230, 1329 2.8 (1.8, 3.8) -43, 49
QUALITY GRADES N Correlation mean Diff, ml LoA, ml mean Diff, % LoA, %
FEV1 A 1,555 0.88 (0.87, 0.89) -39 (-58, -19) -811, 734 -1.2 (-2.2, -0.2) -39, 37
B 611 0.90 (0.88, 0.91) -16 (-15, 45) -729, 759 1.2 (-0.2, 2.7) -36, 38
C 779 0.81 (0.78, 0.83) -17 (-49, 16) -924, 890 -0.2 (-1.8, 1.4) -46, 45
D/F 1,637 0.65 (0.62, 0.68) -67 (-100, -35) -1391, 1256 -3.4 (-5.2, -1.7) -73, 66
FVC A 1,551 0.78 (0.76, 0.80) -27 (-55, 2) -1152, 1099 -0.01 (-1, 1) -42, 42
B 611 0.84 (0.79, 0.85) 45 (1, 89) -1044, 1134 2.7 (1.1, 4.4) -38, 44
C 776 0.76 (0.62, 0.77) 12 (-35, 59) -1289, 1313 1.7 (-0.02, 3.4) -46, 49
D/F 1,624 0.61 (0.57, 0.64) 95 (51, 140) -1702, 1893 4.5 (2.8, 6.2) -65, 74

Analyses were stratified by sex; age; body mass index (BMI); smoking (EVER included current and ex-smokers of tobacco products); known self-reported COPD or asthma; low education level = primary school and lower; high education = secondary school and higher; quality grades were provided by the EasyOne spirometer. The mean absolute difference (EasyOne minus -microGP value) or mean relative difference (EasyOne minus microGP/average)*100) between spirometers are provided with 95% CI. The 95% upper and lower limits of agreement (LoA) are provided. These data are also graphically represented in Appendix IV in S1 File.

Similar Bland-Altman analyses were conducted on the FEV1 and FVC z-scores using age, sex, height and ethnic appropriate GLI normative values (Table 5). Mean differences between paired FEV1 and FVC z-scores from the two spirometers were small and less than 0.5 SD for the overall substudy and across regions.

Table 5. Mean differences and agreement between Z-scores from the two spirometers for overall study population and by region.

OVERALL S Asia China SE Asia Africa Middle East S America NAm/Eur
N 4574 550 630 191 368 428 1571 836
FEV1 (95% LOA) -0.13 (2.7, -3.0) -0.088 (-3.6, 3.4) 0.15 (-1.9, 2.2) -0.021 (-2.3, 2.3) -0.13 (-2.9, 2.6) -0.22 (-3.1, 2.7) -0.25 (-3.6, 3.1) -0.10 (-1.7, 1.4)
FVC (95% LOA) 0.073 (-3.3, 3.4) 0.055 (-4.6, 4.7) 0.17 (-2.6, 2.9) -0.11 (-2.5, 2.2) 0.12 (-3.0, 3.3) -0.49 (-4.2, 3.2) 0.31 (-3.4, 4.0) -0.11 (-1.8, 1.6)

FEV1 = forced expiratory volume in the first second measured; FVC = forced vital capacity; LOA = 95% limits of agreement; z-scores were estimated using the Global Lung Function Initiative normative values appropriate for age, sex, height and ethnicity. Mean differences and LOA provided were calculated using Bland-Altman approach.

Discussion

In this large international multi-center community-based sub-study, we examined the correlation, mean differences and agreement between measurements from two commonly used portable spirometers used in the community and field studies; and how they may vary across diverse populations. We found an average of 65% of quality grades A, B and C, which are clinically acceptable efforts. The overall correlation between paired FEV1 and paired FVC between spirometers were high. The overall mean differences between measurements were small and within acceptable limits of between-effort reproducibility. There were moderate to high correlations between spirometers across diverse populations from different geographic and socio-economic regions. Mean differences between paired FEV1 were uniformly small across regions, while larger differences between paired FVC were observed. In both cases, there was no systematic bias observed across region. The main source of variation between spirometers was at the participant level, with much less variation observed among regions, countries and centers. Exploratory analyses of participant factors identified low education level and poor quality grade efforts were associated with higher variability between spirometers.

As portable spirometers become widely adopted and used in the community, more information on their quality of measurements, reliability, biases and agreement are needed, which will enable correct interpretation and comparison of lung function data across spirometers.

To date, most studies have compared different portable devices in highly selected healthy and mainly young non-smokers within a single population [211]. These studies have reported on high correlation and agreement between devices, which are likely to be inflated given the controlled setting under which the comparisons were made. The relatively small sample sizes and homogeneity of the population studied also limit the ability of prior studies to adequately address the source of variation between spirometers. In contrast, we examined two commonly used portable spirometers in large numbers of unselected individuals, from a wide range of urban and rural communities, and geographic regions. The measurements were collected outside of controlled laboratory setting, which can lend our findings more generalizable to a broader range of populations and settings.

Similar to other community-based studies, we found an average of 25 to 35% of suboptimal quality grade efforts [19]. Even with these data included, there were high correlations and small mean differences between paired FEV1 across regions. For paired FVC, there was more variation in the correlation and mean differences between spirometers. However, for most regions, the mean differences between paired FVC still remained within the acceptable limits of between-effort reproducibility [17]. Furthermore, we observed no consistent bias between spirometers across regions suggesting the variation between devices was random in nature. We found the LoA were wide and variable across regions for both FEV1 and FVC. This was expected as other studies have shown that the LoA will tend to increase with larger sample size and including wider range of data examined [20]. Also, in keeping with previous findings, we observed larger LoA between paired FVC than FEV1 [2, 21].

To date, there has been very limited information on the source of variability between spirometers. The few studies that have examined the effect of age and sex on inter-device variability have reported on disparate findings [2, 7, 9]. These studies were generally small in sample size and included healthy volunteers across a limited age range. Our large sample size and diverse population enabled a robust analysis of the potential sources of variation between spirometers at the region, country, center and participant levels. We identified the largest source of variability was at the participant level, with much smaller contribution at the region, country or center levels. Importantly, participant factors such as older age, higher BMI, previous and current smoking and known COPD/asthma did not adversely affect the variation between spirometers. However, low education level and poor-quality grade efforts, were more likely to demonstrate lower correlation and larger variation between spirometers. Even in these subgroups, the mean differences between spirometers remained small and unbiased, suggesting sufficient precision and comparable estimates of group means across devices.

Our findings have a number of implications. First, we report on the robustness of the FEV1 measurement, which was highly correlated, with small and unbiased differences between devices across diverse populations. The correlation and mean differences between paired FVC, however, were more variable but unbiased across regions. This suggests that a more customized approach by region may be needed to adjust for the larger differences in the FVC between spirometers. Second, the LoA were wide, but random, suggesting considerable between-subject variability in agreement between devices. In this regard, it is important to differentiate the need for individual versus group level precision in estimating lung function for different types of studies. In population-based studies, where exclusion of participants is undesirable (since excluded participants may be systematically different from those included) this will inherently lead to larger inter-subject variability. Furthermore, the focus of population-based studies is mainly on the average differences in lung function between populations or the mean changes over time. In this context, it is more relevant to determine whether on average the recordings from different devices are well correlated, and collected without systematic bias. Therefore, the precision of group mean estimates to provide accurate trends is more important than the precision of individual measurements. By contrast, in clinical studies the within-subject variability may be more relevant in assessing changes in lung function within individuals or small groups in response to an intervention. Here the precision of individual measurements is likely to be more important. To that end, our findings suggest that the two different spirometers, on average, were highly correlated, and had sufficiently high precision in estimating the group means in the overall population and in key subgroups without bias. Furthermore, when the data were transformed using GLI normative values, we observed very small and acceptable differences in the mean z-scores across spirometers; suggesting limited impact on interpretation of the data. Lastly, we did not observe a large contribution to the variation between spirometers at the region, country or center levels, suggesting consistent execution of spirometry measurements across these levels. The main source of variation identified was at the participant level and may be related to factors such as low education level and poor quality spirometry efforts. To this end, while every reasonable effort should be made to increase the precision of individual lung function measurements; those that are beyond what is easily achievable, may not necessarily increase the power of the study but could lead to considerable increase in the complexity and cost of the study and therefore comprise study feasibility [22]. Moreover such methods may create biases (and distort results) particularly if such stringent criteria exclude participants with specific conditions or demographics that may influence lung function.

The strengths of our study include the large sample size, the diverse and unselected populations, which increases the generalizability of our findings. Measurements were taken in random order and supervised by the same-trained staff, and therefore minimize procedure-related variability. Furthermore, all spirographs available from the EasyOne were inspected and assessed by a staff respirologist to ensure agreement with the assessment. Limitations include the measurements of lung function without bronchodilation. The use of bronchodilation can help to reduce variable airway tone in asthmatic patients, which may contribute to the variation between spirometers. However, participants were not requested to withhold any medications prior to testing. Therefore, it is reasonable to assume, that those with chronic lung diseases including asthma would have taken their inhaler medications prior to spirometry assessments; and therefore are less likely to exhibit variable airway tone.

In conclusion, we found moderate to high correlation and small mean differences between paired FEV1 and FVC between the MicroGP and EasyOne spirometers across diverse populations. The differences between paired measurements showed no consistent biases across regions. Our findings support the use of these two spirometers in large long-term studies to provide reliable and comparable measurements, with highly correlated and small unbiased differences between group means across diverse population.

Supporting information

S1 File

(DOC)

Acknowledgments

We would like to acknowledge the assistance of the following members of our team who were involved in the collection, cleaning and validation of the spirometry data: Maha Mushtaha, Roxanna Solano, Justina Greene, Steven Chen and Alex Dragoman. We also would like to acknowledge the statistical help and assistance from Dr Shrinkant Bangdiwala and Ms Chinthanie Ramasundarahettige.

Data Availability

The Population Health Research Institute (PHRI) is the sponsor of this STUDY. The PHRI believes the dissemination of research results is vital and sharing of data is important. PHRI prioritizes access to data to researchers who have worked on the PURE study for a significant duration, have played substantial roles, and have participated in raising the funds to conduct the study. Data will be disclosed upon request and approval of the proposed use of the data by a PURE Review Committee. Specific collaborative projects can be developed with groups with similar data for joint analyses. The underlying data for this clinical study contains personal information and personal health information of participants who were involved, which is protected under Canada’s privacy laws, HIPPA (US) and GDPR, amongst other international laws governing privacy. Consent for public disclosure of this information was not obtained and could pose a threat to confidentiality and violate privacy laws. PHRI has no objection in sharing the information under confidentiality and with appropriate data protection and privacy, including to the journal statisticians in a timely manner, for verification or validation of the analyses in the paper upon request. As per the Canadian funding body guidelines https://cihr-irsc.gc.ca/e/29072.html, (referenced by PLOS), Element 8: “there should be strict limits on access to data and secure procedures for data linkage, subject to data-sharing agreements”. PHRI follows this procedure and does not share or link data from clinical studies publicly where such data is or contains personal health information. Requests for access to data may be sent to PURE Publications Committee and the PHRI Contracts phri.contracts@phri.ca.

Funding Statement

The funding for the main study of PURE is provided in the accompanying appendix. The current substudy is not funded. The authors received no specific funding for this work. The funders of the main study of PURE had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Ferguson GT, Enright PL, Buist AS, Higgins MW. Office spirometry for lung health assessment in adults: a consensus statement from the National Lung Health Education Program. Respir Care 2000;45:513–30. [PubMed] [Google Scholar]
  • 2.Gerbase MW, Dupuis-Lozeron E, Schindler C, Keidel D, Bridevaux PO, Kriemler S, et al. Agreement between spirometers: A challenge in the follow-up of patients and populations? Respiration 2013;85:505–14. doi: 10.1159/000346649 [DOI] [PubMed] [Google Scholar]
  • 3.Viegi G, Simoni M, Pistelli F, Englert N, Salonen R, Niepsuj G, et al. Inter-laboratory comparison of flow-volume curve measurements as quality control producer in the framework of an international epidemiological study (PEACE project). Respir Med 2000;94:194–203. doi: 10.1053/rmed.1999.0672 [DOI] [PubMed] [Google Scholar]
  • 4.Swart F, Schuurmans M, Heydenreich JC, Pieper CH, Bollinger CT. Comparison of a new desktop spirometer (Spirospec) with a laboratory spirometer in a respiratory out-patient clinic. Respir Care 2003;48: 591–95. [PubMed] [Google Scholar]
  • 5.Rebuck DA, Hanania N, D’Urzo AD, Chapman KR. The accuracy of a handheld spirometer. Chest 1996;109:152–57. doi: 10.1378/chest.109.1.152 [DOI] [PubMed] [Google Scholar]
  • 6.Nelson SB, Gardner R, Crapo RO, Jensen RL. Performance evaluation of contemporary spirometers. Chest 1990;97:288–97. doi: 10.1378/chest.97.2.288 [DOI] [PubMed] [Google Scholar]
  • 7.Milanzi EVB, Koppelman GH, Oldenwening M, Augustijin S, Aalders-de Ruijter B, Farenhorst M, et al. Considerations in the use of different spirometers in epidemiology studies. Environ Health 2019;18:39. doi: 10.1186/s12940-019-0478-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Maree DM, Videler EA, Hallauer M, Pieper CH, Bolliger CT. Comparison of a new desktop spirometer (Diagnosa) with a laboratory spirometer. Repiration 2001;68:400–04. [DOI] [PubMed] [Google Scholar]
  • 9.Kunzli N, Ackermann-Liebrich U, Keller R, Perruchoud AP, Schindler C. Variability of FVC and FEV1 due to technician, team, device and subject in an eight centre study: three quality control studies in SAPALDIA. Swiss Study in Air Pollution and Lung Disease in Adults. Eur Respir J 1995;8:371–76. doi: 10.1183/09031936.95.08030371 [DOI] [PubMed] [Google Scholar]
  • 10.Caras WE, Winter MG, Dillard T, Reasor T. Performance comparison of the handheld MicroPlus portable spirometer and the SensorMedics Vmax22 diagnostic spirometer. Respir Care 1999;44: 1465–73. [Google Scholar]
  • 11.Barr RG, Stemple KJ, Mesia-Vela S, Basner RC, Derk SJ, Hennenberger PK, et al. Reproducibility and validity of a handhild spirometer. Respir Care 2008;53:433–41. [PMC free article] [PubMed] [Google Scholar]
  • 12.Teo K, Chow CK, Vaz M, Rangarajan S, Yusuf S et al. , on behalf of The PURE Investigators-writing group. The Prospective Urban Rural Epidemiology (PURE) study: examining the impact of societal influences on chronic non-communicable diseases in low-, middle-, and high-income countries. Am Heart J 2009;158:1–7. doi: 10.1016/j.ahj.2009.04.019 [DOI] [PubMed] [Google Scholar]
  • 13.Miller MR, Hankinson J, Brusasco V, Burgos F, Casaburi R, Coates A, et al. Standardisation of spirometry. Eur Respir J 2005;26: 319–38. doi: 10.1183/09031936.05.00034805 [DOI] [PubMed] [Google Scholar]
  • 14.Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurements. Lancet 1986;1:307–10. [PubMed] [Google Scholar]
  • 15.World Bank. How we classify countries. http://data.worldbank.org/about/country-classification [cited 2013 Jan 15].
  • 16.Quanjer PH, Stanojevic S, Cole TJ, Baur X, Hall GL, Culver BH, et al. , ERS Global Lung Function Initiatives. Multi-ethnic reference values for spirometry for the 3–95 yr age range: the global lung 2012 equations. Eur Respir J 2012;40:1324–43. doi: 10.1183/09031936.00080312 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Graham BL, Steenbruggen I, Miller MR, Barjaktarevic IZ, Cooper BG, Hall GL, et al., on behalf of the American Thoracic Society and the European Respiratory Society. An Official American Thoracic Society and European Respiratory Society Technical Statement. Am J Respir Crit Care Med 2019;200:e70–e88. doi: 10.1164/rccm.201908-1590ST [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Quanjer PH, Stanojevic S. Do the Global Lung Function Inittiative 2012 equations fit my population? Eur Respir J 2016;48:1782–85. doi: 10.1183/13993003.01757-2016 [DOI] [PubMed] [Google Scholar]
  • 19.Levy ML, Quanjer PH, Booker R, Cooper BG, Holmes S, Small IR. Diagnostic spirometer in primary care. Proposed standards for general practice compliant with American Thoracic Society and European Respiratory Society recommendations. Prim Care Respir J 2009;18:130–47. doi: 10.4104/pcrj.2009.00054 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Stöckl D, Cabaleiro D, Van Uytfanghe K, Thienpont L. Interpreting method comparison studies by use of the Bland-Altman plot: Reflectng the importance of sample size by incorporating confidence limits and pre-defined error limits in the graphic. Clin Chem 2004;50:2216–18. doi: 10.1373/clinchem.2004.036095 [DOI] [PubMed] [Google Scholar]
  • 21.Bridevaux P-O, Dupuis-Lozeron E, Schindler C, Keidel D, Gerbase MW, Probst-Hensch NM, et al. Spirometer replacement and serial lung function measurements in population studies: Results from the SAPALDIA Study. Am J Epidemiol 2015;181:752–61. doi: 10.1093/aje/kwu352 [DOI] [PubMed] [Google Scholar]
  • 22.Vickers AJ. How many repeated measures in repeated measures designs? Statistical issues for comparative trials. BMC Med Res Methodol 2003;3:22. doi: 10.1186/1471-2288-3-22 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLOS Glob Public Health. doi: 10.1371/journal.pgph.0000141.r001

Decision Letter 0

Andre F S Amaral, Julia Robinson

21 Jul 2021

PGPH-D-21-00083

Differences and agreement between two portable hand-held spirometers across diverse community-based populations in the Prospective Urban Rural Epidemiology (PURE) study.

PLOS Global Public Health

Dear Dr. Duong,

Thank you for submitting your manuscript to PLOS Global Public Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Global Public Health’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Sep 04 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at globalpubhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pgph/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Andre F. S. Amaral, Ph.D.

Academic Editor

PLOS Global Public Health

Journal Requirements:

Additional Editor Comments (if provided):

Reviewers' comments:

**********

Reviewer #1: Thank you very much for the opportunity to review this manuscript. Duong and colleagues assess the comparison between two different spirometers in a very diverse population and in a larger study. This contributes good evidence on comparisons of spirometers especially in a real world setting as previous comparisons have been small and in a controlled setting.

Taking advantage of this large and diverse population, this reviewer feels the analyses can be expanded by comparing the two spirometers using the Global Lung Initiative reference equations. This would also help to provide very vital insight on how the GLI behave in this population as they have been indicated to behave differently.

This reviewer is also interested in the risk exposure-Lung function associations. Were these associations adjusted or crude? Were the lung function measurements mutually adjusted for? Since one measurement was at baseline and another at follow up? More specific details on this part of the analysis would benefit the reader. It would also be interesting to know whether these risk exposure –lung function associations have been assessed independently at baseline and at follow up in the said population . It is less surprising that the similar effects were observed for both spirometers since relatively high correlations were observed and indications point that the use of a particular spirometer was not associated with a particular region/subset of the cohort. This would complicate the risk exposure-lung function relationships especially if particular regions have higher PM2.5 levels etc.

**********************************

Reviewer #2: Thank you for the opportunity to review this valuable sub study from an international multi-centre prospective cohort examining the reliability in measurements between two handheld spirometers. The study uses a large and diverse population to support reliability and generalisability, and presents further findings according to strata and pollution exposure. These findings will support the global health community in ensuring accessibility to lung function testing. I have a number of comments and suggestions for sensitivity and sub analysis that would support the authors’ message and reliability of evidence.

• The authors report little effect between strata according to sex, age category, BMI category, smoking status, and COPD/Asthma however there are no tests reported to confirm lack of significant effect. Either statistical tests or visualisation of data would support the assertions of minimal variation across strata.

• Certain regions have lower percentages of high grade blows and high percentages of low grade blows. Do the authors have any insight into the rationale for differences in grade between regions? The authors could present supplemental sub analysis to investigate the potential source of variability, such as countries within region. There may be more consistent bias within specific countries, which may represent particular factors e.g. socioeconomics.

• Can the authors report the % of blows that are outside the limits of agreement in the bland altman plots/table, and can the authors report any sub analyses that explore the characteristics of participants that were outside these limits compared to within to assess whether there are factors that bias reliability between spirometers. Similarly, it would be interesting to present the grade of EasyOne measures using colours on the bland altmann plots to interpret how grade may have contributed to agreement between spirometers.

• The authors present data from 4,603 participants across 18 countries and 7 regions. The demographics of this population is presented in Table 1. Table 4 is based on data from >100,000 individuals which are not demographically reported, it is not clear whether there is overlap of individuals between MicroGP and EasyOne, and it would be interesting to perform sensitivity analysis in those individuals with data from both spirometers. It would also be possible to then specify spirometer in models as a covariate to see whether there is a significant association.

• The authors could present supplemental information that the first five consecutive participants from each community are representative of the wider PURE study, which would also support Table 4 interpretation.

• Can the authors comment on the possibility of misclassification of COPD and asthma, 7% seems potentially low for COPD and asthma as a single variable. Were numbers representative across all regions, or do they reflect specific ones? In supplement, the authors could present a breakdown of strata by region to assist in interpretation of Table 3.

• The authors should confirm whether assumptions for intraclass coefficients are achieved, such as those regarding normality and variance, as the ICC can be sensitive to this.

• The authors demonstrate good correlation and agreement overall, but can the authors comment on whether studies utilising a mixture of spirometers should apply weightings according to certain (demographic?) factors to support standardisation that may enable more accurate data alignment, in particular whether this valuable dataset can estimate such weightings, or whether the authors are confidence this would not be necessary. Supplementary sensitivity analysis of e.g. grade A-C repeat measures within a spirometer vs between spirometer may give further reliable insight in to comparability of datasets, coefficient of variation may facilitate this.

• There are a few minor typographical errors, for example Table 3 includes 95%CI with some missing or erroneous symbols. E.g. FVC No COPD/Asthma mean Diff % -2.5 (1.6, 3.3): is this a negative difference with erroneous 95%CI or positive difference? Similarly for FEV1 age >65 and 50-65 mean diff %. There may be some issues with 95%CI values for females FVC and males FEV1 that authors should check. Table 4 adjusted model EasyOne-FVC -22,8 rather than -22.8

**********************************

Reviewer #3: Summary

The authors compared the correlation and agreement between two spirometry devices (MicroGP and EasyOne) in a subset of 4603 participants from the PURE cohort study. Device allocation was random and both tests performed on the same day. Appropriate statistical methods were chosen including Intra-class correlation coefficients and Bland-Altman plots. Results were stratified by clinical and epidemiological characteristics and presented appropriately in figures and tables. Strong correlations were noted between paired FEV1 and FVC measurements, and differences between spirometers were largely within the limits of between effort reproducibility according to ATS/ERS guidelines. Limits of agreement were wide, especially for FVC. Participant characteristics appeared to have minimal impact on agreement between the two devices, however more variation was noted by location especially for South America and the Middle East, which may be clinically significant for FVC. Robust discussion of strengths and limitations was presented. The authors concluded that the use of these two spirometers provide reliable data that are without bias. This conclusion while understandable as the results from the spirometers are highly correlated, is perhaps a little overstretched on account of the wide limits of agreement.

Major concerns

The authors do not provide enough detail on exactly how the random order was generated when deciding which spirometer to use first? This is import to avoid a training effect.

The limits of agreement for FEV1 and FVC are wide and the authors provide justification for this based on previous research in studies with large sample sizes. However, it may be an over stretch to conclude that “Our findings suggest that the use of these two spirometers in large long-term studies provide reliable data that are without bias.” The fact the limits of agreement are wide, suggests some ambiguity in the result especially for FVC, which is likely clinically significant in some of the populations and caution should be taken when forming conclusions.

Minor issues

It would be appropriate in this circumstance to provide a brief description in the methods section as to how the two spirometers work and how they are different. For example, ultrasonic spirometers measure flow independently of gas composition, pressure, temperature, and humidity, and why the two spirometers may be expected to produce different results. This would help give context as to why it is important to compare these two devices.

The authors stated that 65% of participants achieved clinically acceptable measurements using the automated grading criteria generated by the EasyOne device, do they have data from the MicroGP from which you can classify spirometry manoeuvres into these ranges? It is important to know how many participants achieved clinically acceptable results on each device if they are going to quote the 65% value from the EasyOne in the publication. This speaks to the ease of use of the monitors.

It may be useful to include mean percent predicted values for each device (e.g. FEV1, EasyOne 98% of predicted, FEV1 MicroGP 96% of predicted). This will help to clearly display how the difference between devices impacts on the predicted norm for each population.

**********************************

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLOS Glob Public Health. doi: 10.1371/journal.pgph.0000141.r003

Decision Letter 1

Andre F S Amaral, Julia Robinson

3 Dec 2021

Differences and agreement between two portable hand-held spirometers across diverse community-based populations in the Prospective Urban Rural Epidemiology (PURE) study.

PGPH-D-21-00083R1

Dear Dr. Duong,

We're pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you'll receive an e-mail detailing the required amendments. When these have been addressed, you'll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at https://www.editorialmanager.com/pgph/ click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact globalpubhealth@plos.org.

Kind regards,

Andre F. S. Amaral, Ph.D.

Academic Editor

PLOS Global Public Health

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Does this manuscript meet PLOS Global Public Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Global Public Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: Thank you to the authors for comprehensively addressing the review comments by undertaking major revisions to the presented findings and methodology. I note that the interpretations are reliable and the reported findings offer added insights regarding the sources of variability. Additionally, I note a number of tables and figures have been provided in appended results which offer further focus and justification of author interpretation. I have no further comments.

Reviewer #3: Thank you for the opportunity to review this resubmission. The authors appear to have made considerable effort to address to comments made at earlier review. Although not completely explanatory, they have defined that the randomisation procedure to generate the order of spirometry use was done centrally by the coordinating centre. By incorporating the GLI equations into their study, they have demonstrated that the z-score differences between spirometers was small and within an acceptable range so not to impact interpretation. In addition, the wording in the conclusions section has been adjusted so not to overstate their findings. Authors have provided a clear explanation of how the two spirometers work; this adds context to their paper as to why a comparative study is necessary. They have also provided interesting data which explores where the variation between spirometers originates, unsurprisingly the majority of variation was identified at the individual level. Overall, the changes have much improved the paper, which will be a valuable resource for other longitudinal studies where it is necessary to change spirometers.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: Yes: Dr Iain Stewart

Reviewer #3: Yes: Ben Knox-Brown

**********

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    (DOC)

    Attachment

    Submitted filename: PLOSPH_ReviewerResponseNov2021.1.docx

    Data Availability Statement

    The Population Health Research Institute (PHRI) is the sponsor of this STUDY. The PHRI believes the dissemination of research results is vital and sharing of data is important. PHRI prioritizes access to data to researchers who have worked on the PURE study for a significant duration, have played substantial roles, and have participated in raising the funds to conduct the study. Data will be disclosed upon request and approval of the proposed use of the data by a PURE Review Committee. Specific collaborative projects can be developed with groups with similar data for joint analyses. The underlying data for this clinical study contains personal information and personal health information of participants who were involved, which is protected under Canada’s privacy laws, HIPPA (US) and GDPR, amongst other international laws governing privacy. Consent for public disclosure of this information was not obtained and could pose a threat to confidentiality and violate privacy laws. PHRI has no objection in sharing the information under confidentiality and with appropriate data protection and privacy, including to the journal statisticians in a timely manner, for verification or validation of the analyses in the paper upon request. As per the Canadian funding body guidelines https://cihr-irsc.gc.ca/e/29072.html, (referenced by PLOS), Element 8: “there should be strict limits on access to data and secure procedures for data linkage, subject to data-sharing agreements”. PHRI follows this procedure and does not share or link data from clinical studies publicly where such data is or contains personal health information. Requests for access to data may be sent to PURE Publications Committee and the PHRI Contracts phri.contracts@phri.ca.


    Articles from PLOS Global Public Health are provided here courtesy of PLOS

    RESOURCES