Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Dec 3;15(12):e0243351. doi: 10.1371/journal.pone.0243351

Changes to the sample design and weighting methods of a public health surveillance system to also include persons not receiving HIV medical care

Christopher H Johnson 1, Linda Beer 1,*, R Lee Harding 2, Ronaldo Iachan 2, Davia Moyse 2, Adam Lee 2, Tonja Kyle 2, Pranesh P Chowdhury 1, R Luke Shouse 1
Editor: Mohammad Asghari Jafarabadi3
PMCID: PMC7714102  PMID: 33270798

Abstract

Objectives

The Medical Monitoring Project (MMP) is a public health surveillance system that provides representative estimates of the experiences and behaviors of adults with diagnosed HIV in the United States. In 2015, the sample design and frame of MMP changed from a system that only included HIV patients to one that captures the experiences of persons receiving and not receiving HIV care. We describe methods investigated for calculating survey weights, the approach chosen, and the benefits of using a dynamic surveillance registry as a sampling frame.

Methods

MMP samples adults with diagnosed HIV from the National HIV Surveillance System, the HIV case surveillance registry for the United States. In the methodological study presented in this manuscript, we compared methods that account for sample design and nonresponse, including weighting class adjustment vs. propensity weighting and a single-stage nonresponse adjustment vs. sequential adjustments for noncontact and nonresponse. We investigated how best to adjust for non-coverage using surveillance data to post-stratify estimates.

Results

After assessing these methods, we chose as our preferred procedure weighting class adjustments and a single-stage nonresponse adjustment. Classes were constructed using variables associated with respondents’ characteristics and important survey outcomes, chief among them laboratory results available from surveillance that served as a proxy for medical care.

Conclusions

MMPs weighting procedures reduced sample bias by leveraging auxiliary information on medical care available from the surveillance registry sampling frame. Expanding MMPs population of focus provides important information on characteristics of persons with diagnosed HIV that complement the information provided by the surveillance registry. MMP methods can be applied to other disease registries or population-monitoring systems when more detailed information is needed for a population, with the detailed information obtained efficiently from a representative sample of the population covered by the registry.

Introduction

The Medical Monitoring Project (MMP) is a Centers for Disease Control and Prevention (CDC) surveillance system that provides population-based information on behaviors and clinical characteristics of persons with diagnosed HIV [1]. The information collected through MMP includes essential information for preventing HIV-related morbidity and HIV transmission, such as barriers to medical care utilization, adherence to treatment, and sexual behaviors [2].

During 2005–2014, MMP sampled persons from HIV care facilities, which excluded persons who had not been linked to or retained in HIV care [3]. Persons with undiagnosed HIV or diagnosed but not receiving medical care were estimated to account for 92% of HIV transmissions in 2009 [4], and ensuring they receive medical care is a national HIV prevention goal. In 2015, MMP expanded its population of inference to all adults with diagnosed HIV, using CDC’s National HIV/AIDS Surveillance System (NHSS) to construct a frame representing this population. NHSS is an HIV case surveillance registry first established in 1981 that collects a core set of information on the characteristics of all persons with diagnosed HIV in all U.S. states and dependent areas [5].

2005–2014 population, frame, and sample design

MMP was designed to produce nationally representative estimates as well as locally representative estimates for participating project areas [6]. States were the primary sampling units (PSUs) and were sampled with probability proportional to size, with the number of AIDS cases reported through the end of 2002 used as the measure of size, resulting in some states’ being selected with certainty. Within sampled states, six jurisdictions with federally funded local HIV surveillance programs brought the total number of independent project areas, or strata in the national sample design, to 23.

The 2005–2014 MMP population of inference implicitly excluded those not receiving care because NHSS was not comprehensive in all states (e.g., before HIV without an accompanying AIDS diagnosis was reportable in every state) and was thus inadequate as a sampling frame. Instead, MMP employed multi-stage probability-proportional-to-size facility-based sampling and generated patient lists for participating facilities, from which patient samples with probability inversely proportional to reported patient volume were drawn. Despite the advantages of constructing a frame in stages, this sampling method was labor- and time-intensive and excluded a key population, persons not receiving HIV care [7].

2015—Present population, frame, and sample design

Each state and territory in the US collects name-based HIV and AIDS case surveillance registry data that include HIV-related laboratory tests, which provide information on HIV care utilization and disease progression. These data are reported to NHSS, after which CDC cleans and de-duplicates records [8]. Completeness and timeliness of reporting having improved in the decade following the initiation of MMP, and NHSS comprises the most comprehensive source of information for the population of interest. The NHSS surveillance registry data submitted to CDC as of December 31, 2014, served as the sampling frame for the 2015 MMP cycle (the first data collection cycle under the new design).

Despite fundamental changes to sampling at the local level, MMP retained the same sample of states selected in 2005 in the new sample design (Fig 1). An analysis of counts of reported 2011 HIV diagnoses showed that the proportional contribution of states to the burden of HIV had not changed appreciably from the distribution of AIDS cases in 2002. Thus, the first-stage design weights reflecting states’ original sampling probabilities were still reasonably close to what they would have been if sampled using more recent HIV diagnosis data and were retained [9]. When weighting probability samples, some practitioners prefer to adjust respondents’ data to comport with population totals via a calibration process [10], rather than through separate adjustments for selection probability and post-stratification. Absent compelling reasons to change the stages of adjustment, we chose to keep weighting methods the same as previously employed (e.g., post-stratification as a separate adjustment).

Fig 1. Map of areas participating in MMP.

Fig 1

Note: These 23 areas were funded to conduct data colloection for the 2015 cycle: California (including the separately funded jurisdictions of Los Angeles Country and San Francisco), Delaware, Florida, Georgia, Illinois (including the separately funded jurisdiction of Chicago), Indiana, Michigan, Mississippi, New Jersey, New York (including the separately funded jurisdiction of New York City), North Carolina, Oregon, Pennsylvania (including the separately funded jurisdiction of Philadelphia), Puerto Rico, Texas (including the separately funded jurisdiction of Houston), Virginia, and Washington.

In 2015, the sample design and frame of MMP changed from a system that included only HIV patients to one that captures the experiences of persons receiving and not receiving HIV care. In this manuscript, we describe methods investigated for calculating survey weights for the 2015 cycle, the approach chosen, and the benefits of using a dynamic surveillance registry as a sampling frame. Such a comprehensive description of the methods, along with various options considered and ruled out, as well as the rationale for these decisions, has not previously been available. This information can inform other studies that use sample survey methods.

The 2015 to present national population of inference for MMP is all adults with diagnosed HIV living in the US, and eligibility criteria reflect these criteria (i.e., alive, diagnosed with HIV, aged 18 years or older, residing in the US). Information on presumed current residence was used to construct separate frames from NHSS data for each of the 23 participating project areas. Records in NHSS are de-identified (under provisions of CDC’s Assurance of Confidentiality [11]) and include only limited information about where the person currently resides. CDC staff drew simple random samples from the 23 separate frame files, and project area staff linked their samples to local case surveillance systems and extracted more-detailed contact information for locating and recruiting sampled persons. Sample sizes deemed sufficient under the old design ranged from 200 to 800, and precision was expected to increase with unclustered samples (in compared with the previous multi-stage, clustered sample design).

Materials and methods

In this manuscript, we present a methodological study of MMP weighting methods. Weighting respondents’ data incorporates three adjustments, correcting for different sampling probabilities, nonresponse, and frame limitations (Table 1). The successive adjustment factors are multiplied together to derive analysis weights. A novel feature of MMP is the creation and use of different sets of weights for national and local estimates [12].

Table 1. Components of MMP analysis weights.

COMPONENT FORMULA COMMENT
DESIGN WEIGHTING STAGE
Design weight W^0j={1PjProjectareas1PiPjNational For individual j sampled with probability Pj. National weights additionally incorporate first-stage sampling of state i with probability Pi.
Multiplicity adjustment W^1j={W^0j2ifsampledmorethanonceW^0jifsampledonlyonce
NONRESPONSE ADJUSTMENT STAGE
Overall nonresponse weight adjustment factor W2jk=jԑAW^1jkjԑRW^1jk Adjustments made within nonresponse classes k, which vary by area as well as nationally. A is all eligible sampled persons; R is respondents only.
Overall nonresponse adjusted weight W^2j=W^1jW2j
POST-STRATIFICATION AND TRIMMING STAGE
Initial post-stratification factor W3jh=ThjԑRW^2jh Adjustments made within cells h defined by gender, race/ethnicity, and age, by area and nationally. T is the number of eligible persons on the delayed frame; R is respondents only.
Post-stratified weight W^3j=W^2jW3j
Trimmed weight W^4j={MedianW^3j+4(IQR)ifW^3j>MedianW^3j+4(IQR)W^3jifW^3jMedianW^3j+4(IQR)
Final post-stratification factor W5jh=ThjεRW^4jh Within cells h defined by sex, race/ethnicity, and age, by area and nationally.
FINAL WEIGHT STAGE
W^5j=W^4jW5j

Design weighting

The first component of the weight is the design weight, the reciprocal of the probability of selection. This component is important for national weights but not applied to weights for project areas, for which analysis is conditional on their initial selection and the implied value of the factor is 1. Selection probabilities were uniform within jurisdictions but varied greatly across states—areas with fewer cases had higher sampling rates, and conversely.

The data reconciliation process disclosed some duplicate records on the frame, where multiple records were found to represent the same person. If a duplicate record was sampled, its weight was reduced by half to reflect its multiple opportunities for selection; we capped this multiplicity adjustment at 2 because finding more than one duplicate record was rare. Duplicate records not sampled did not require adjustment, although their contributions to frame totals were correspondingly reduced. Through interview or NHSS updates, some sampled persons were found to have lived in a different project area at the time of sampling. Their records were reassigned to the other project area, but retained the original design weight.

Nonresponse adjustment

The second component of the weight is nonresponse adjustment, which is analogous to the reciprocal of the probability of responding. An advantage of the new design is the availability of extensive demographic data from NHSS at the time of frame construction. Using variables significantly associated with nonresponse in bivariate analysis, we conducted multivariable analysis to identify predictors of nonresponse at national and project area levels. We considered these predictors of nonresponse: sex at birth, age, race/ethnicity, residency (US vs. other, and MMP vs. non-MMP jurisdiction), transmission risk category, AIDS at HIV diagnosis, time since last update of contact information, time since diagnosis, most recent viral load measurement, and presumed HIV care status (a three-level measure based on HIV lab results in NHSS: 2+ HIV labs in the past 12 months 90 days or more apart, at least 1 HIV lab in the past 12 months, and 0 HIV labs in the past 12 months). When an adjustment method required us to choose the strongest predictors from those that remained significant in multivariable analyses, we ranked them by absolute log odds ratio (|log(OR)|, the absolute value of the beta estimate), based on parameter estimates from the final models.

Adjustment method

Previously, MMP weighting employed the weighting class method, in which a few adjustment classes were formed based on variables found in logistic regression analysis to predict nonresponse. The change in MMP sampling methods was an opportunity to investigate other methodological changes, such as whether another weighting method previously explored in national surveys might perform better [13, 14]. Based on weighting methods used in a pilot project of MMP’s new design [15], MMP considered adopting the propensity weighting method. This method allows incorporating more predictors than the weighting class method (for which we could use only the strongest predictors, at most two per project area, lest the classes formed by the resulting cross-classification become too sparse); propensity weighting is a model-based generalization of the weighting class method that permits an arbitrary number of predictors, including continuous variables [16]. The resulting predicted probabilities are then typically grouped into a few categories, often quintiles, to reduce their variability. We implemented both methods to calculate weights, and compared the resulting estimates and their variance.

Adjustment stages

For each adjustment method, we investigated making sequential nonresponse adjustments for two outcomes: noncontact and, among those contacted, nonresponse. Even though contacting respondents is necessary before they can be interviewed, most surveys collapse these stages rather than applying differential adjustments that are multiplied together. Our hypothesis was that different factors might be associated with contact compared with response among those contacted, and that by making separate adjustments in sequence (implying two adjustment factors multiplied together in a respondent’s weight) we might reduce total nonresponse bias.

Noncoverage adjustment

The study population for the MMP 2015 cycle was all persons with diagnosed HIV at the end of the reference year (i.e. December 31, 2014). A year after the MMP data collection cycle ended, we constructed a second, “delayed” frame that included records that would have been eligible if they had been reported at the time of sampling. The dynamic nature of the frame also allowed us to identify cases determined to be ineligible after the sampling date and adjust the population size accordingly. This updated information on the population of persons with diagnosed HIV was used for post-stratification to known totals, correcting for frame limitations.

Population characteristics typically used for post-stratification noncoverage adjustment are those that correlate with key outcome measurements, but are unavailable for nonresponse adjustment. However, in MMP demographic and care-related variables were available for the entire sample, regardless of response. A count of delayed frame records provided updated population size estimates by sex, age, and race/ethnicity. Post-stratifying to these totals forces the sample-based estimate of population size to conform while correcting for late reports and updated eligibility information. The final post-stratification adjustment provided additional protection against the possibility that the sample-based nonresponse adjustments had distorted demographic distributions.

Post-stratification also incorporated a weight-trimming process that limited the weights’ variability and thereby the variance of estimates. Within adjustment cells, initial weights were compared to the median weight plus 4 times the interquartile range and truncated if they exceeded this cap. These capped weights were then post-stratified to re-adjust and increase any weight sums reduced by trimming. However, due to the intensive follow-up necessary for MMP recruitment in addition to the continual post-sampling updating of the surveillance registry data, enhanced information revealed ineligible records and inaccurate residence classification, and adjusting for these reduced the estimated population totals.

Variance estimation

Calculating survey estimates requires application of appropriate weights and, for standard errors to accompany point estimates, application of appropriate design variables. We developed strata and cluster variables that accounted for the sample design. Because of the two-stage, stratified sample design, different sets of sample design variables are employed for variance estimation at the national and local levels (an unusual feature of MMP).

Nationally, many states (which were the PSUs in the stratified probability proportional to size design) were sampled with certainty because of their large numbers of persons living with AIDS, and each of these was defined as its own stratum. Among non-certainty PSUs, strata were created by grouping 2–3 states that had similar selection probabilities. To provide stratum-level between-cluster variance components, all strata needed at least two clusters. For certainty PSUs, patients were the clusters. For the strata composed of non-certainty states, the state was the cluster.

For project area estimates, variance estimation is conditional on the initial sampling of states as PSUs, meaning that this stage of sampling is ignored and the design is adequately described as a simple random sample. The sampling fraction exceeded 10% in only the smallest state sampled, Delaware, and we determined there was no need to apply a finite population correction factor [17] in any area.

In accordance with guidelines for defining public health research, CDC and most project areas have determined MMP is public health surveillance used for disease control, program, or policy purposes. Local institutional review board (IRB) approval is obtained from the University of Puerto Rico Medical Science Campus IRB and the Virginia Department of Health IRB. Written or documented informed consent is obtained from all participants, as required by local areas.

Results

Respondents

Weighted characteristics of adults with diagnosed HIV from the 2015 MMP cycle have previously been reported [2]. In brief, an estimated 75% (95% confidence interval [CI]: 72.1–77.4) were male, 48% (CI: 44.1–51.3) identified at heterosexual or straight, and 41% (CI: 31.0–51.4) were Black or African American. An estimated 62% (CI: 58.8–64.9) had received an HIV diagnosis at least 10 years earlier.

Response rates

Of 9,700 persons sampled from an initial frame count of 782,718, 521 were determined to be ineligible (5.4%; project area range 1.5%–15.9%, Fig 2). Of these, 299 died before the sampling date (but their death had not yet been reported to NHSS), 356 lived outside an MMP jurisdiction, 40 had no HIV diagnosis, 4 were duplicates of another sampled person, and 1 was less than 18 years of age.

Fig 2. MMP response rates.

Fig 2

The national response rate was 39.8% (range 30.8%–48.7%). Of the initial eligible sample of 9,179, there were 5,525 eligible nonrespondents (60.2%). Of these, 1,459 (15.9%) were contacted but either refused or did not respond to contact attempts, while 4,066 could not be located and were never contacted (44.3%). The national contact rate among eligibles was 55.7% (range 38.8%–79.8%). Of the 5,113 who were both eligible and contacted, 3,654 responded; thus cooperation (i.e., response among those contacted) was 71.5% nationally (range 55.7%–91.6%). Response rates in 2015 were generally comparable to those experienced in earlier data collection cycles under the old design [16].

A concern about the new design was that contact information might be outdated or incorrect, particularly for those diagnosed with HIV years earlier. A small proportion of respondents, 3.7%, were found to have lived in a different MMP jurisdiction on the sampling date than the jurisdiction from which they were sampled. Using information from the delayed frame, we found that almost 3.6% of persons were ineligible because they had moved out of the sampling jurisdiction before the sampling date or, more rarely, had never lived in the sampling jurisdiction. Because many sampled persons were never contacted, however, the actual percentage who moved may be higher. Challenges in fielding the MMP sample reflect the limitations of case surveillance data due to ineligibility or insufficient location information.

Nonresponse adjustment

In our nonresponse analyses, the strongest predictor of response, nationally and in 17 project areas, was presumed HIV care status. Measured by absolute log odds ratios, this effect ranged across areas from 0.472 to 0.978 when comparing those not receiving care to those in care (corresponding to odds ratios range 0.624–0.376). Because a person’s care status is dynamic and laboratory reports can be delayed, we used updated information from NHSS to measure care status in the period preceding a respondent’s interview (and, for nonrespondents, their care status in the period before the median time of interview for respondents) rather than at the time of sampling.

The adjustment variables varied across areas in both methods, but the weighting class and propensity models were most often equal, using the same set of categorical adjustment variables. Locally, few continuous variables exhibited strong associations with response, leading to identical models in some cases (and only slightly different models when grouping propensities into quintiles resulted in collapsing categories that were distinct in the corresponding weighting class model for an area). In the final models, care status was significant in a majority of areas, and in 10 cases no additional factor was significant. Although the model fit statistics were comparable for both methods (see S1 Appendix), we chose the cell weighting method because, having less variance, it performed better than the propensity method.

For both models, single-stage and two-stage adjustments produced similar results for national estimates, both for the resulting weighted estimates themselves and the variance added due to weighting. Locally, if one or more variables was significantly associated with noncontact, in many cases (11 of 22 areas) no variable was associated with nonresponse—possibly due in part to the reduced number of persons contacted in areas with low contact rates. In one instance, no variable was associated with noncontact, but variables were associated with nonresponse. Although we did not find this using these data, the same factor could operate in opposite directions across the two stages (e.g. decreasing contact but increasing response), and would thus have an attenuated effect on total response, since the adjustment factors are multiplied together and the final factor reflects their cumulative effect. Because implementing a two-stage adjustment was generally not feasible locally, we opted for a single-stage adjustment nationally.

Discussion

The 2015 data collection cycle expanded the scope of MMP. As expected, including persons diagnosed with HIV but not receiving care added a harder-to-reach population. Many of the key outcomes monitored by MMP relate to receipt of care. Variables strongly correlated with both key outcomes and response, such as care status, are normally ideal candidates for use in nonresponse adjustment. They must also be available for both respondents and nonrespondents, and are usually measured when the sample is drawn. Their utility for this purpose in MMP, however, is somewhat limited by the dynamic nature of care. Many people classified as out of care when sampled had in fact received care, but lab results associated with visits are subject to reporting delays—that is, these people were misclassified, based on the initial information. Information from the delayed frame updated care status for respondents and nonrespondents alike, but medical records abstracted for respondents may disclose care not yet reflected in surveillance records. Thus, this group of respondents is affected disproportionately, as is the efficiency of this information for reducing nonresponse bias. However, our use of a continually updated national HIV surveillance registry as a frame allowed us to partially correct for this. This approach may be useful for other studies of populations for which an important characteristic is dynamic and associated with both response and other key variables measured by the study. In addition, the use of updated information for weighting allowed us to adjust for noncoverage and eligibility, thereby improving the quality of the data beyond what would be possible if we had used only the initial sampling frame.

In addition to misclassification of care status, response was low among those presumed to be out of care according to NHSS, and many of those who responded were later determined to be in care based on interview and medical record abstraction. The nonresponse adjustment factors for this group were so high in certain areas that their weights were capped during the weight-trimming stage. Weight trimming to reduce variance, while potentially increasing bias, is a standard trade-off in weighting and is accepted practice. Still, capping the weights limited out-of-care respondents’ contributions to weighted estimates, making weighted estimates for the entire MMP population more similar to estimates for the in-care subpopulation, since the in-care respondents’ weights were not reduced.

An appealing aspect of the propensity method was the opportunity it offered to use a common set of predictors in all areas when constructing national weights, bringing greater uniformity to our methods. For the sake of interpretability of the propensity models at the local level, however, we chose to include only those predictors exhibiting significant bivariate associations with the local outcome. This provided more continuity with weighting methods employed in previous MMP cycles. Like the weighting class method, the propensity method also involved a screening step for bivariate association, so it involved no less effort and offered no logistical advantage for the current MMP design.

We chose to continue to adjust for overall nonresponse using a single-stage adjustment. The two-stage weighting approach conferred little benefit to local estimates. Ultimately, few areas had distinct adjustment factors for noncontact and contacted nonresponse. Motivated by theoretical considerations that were not so much statistical as behavioral and logistical (i.e. that determinants of contact and response might differ), the data did not support modeling such a sequential process. In a different application or with less sparse data, however, a multi-stage adjustment might improve the representativeness of weighted results. Nationally, although sample sizes were large and there were significant and distinct associations for each stage of nonresponse, we judged the two-stage adjustment not worth the additional analytic effort that building separate models required.

After evaluation based on the comparisons of the weighted sums, design effects, and the comparison of the key weighted estimates, we chose to continue to use the cell weighting method, coupled with a single-stage adjustment. For project areas, the cell weighting method had a smaller weighting-induced design effect than the propensity method. For a majority of project areas, both methods found significant predictors for only a single stage of nonresponse.

Public health implications

Under the old design, MMP studied a subset of persons with diagnosed HIV receiving care and reached them indirectly through multi-stage sampling. Starting with a frame that includes all persons with diagnosed HIV allows for better integration of MMP and NHSS, and this closer linkage benefits both systems. Many survey frames are static, but NHSS is ongoing surveillance, subject to periodic updates, presenting both opportunities and challenges. Updated information on deaths, new case reports, and residence was key to assigning appropriate weights to sampled persons and was used to evaluate the quality of information available on the sampling date.

MMP changed its design to tap into an existing registry whose completeness and timeliness are well established, sparing the considerable, ongoing effort required to create frames of providers and patients. Using NHSS also means that MMP inherits both its strengths and limitations. NHSS is extensively used for reporting HIV prevalence and trends, providing information on a limited number of key characteristics of persons with diagnosed HIV, but was not designed to be a survey frame. Different considerations influenced NHSS’s development, and the inherent difficulty of reporting on recent receipt of care makes it less than optimal for purposes of weighting MMP data, which depends on a participant’s date of interview during a relatively long field period.

MMP complements the breadth of NHSS, which includes all persons with diagnosed HIV but has limited information about them, with in-depth information from personal interviews and abstracted medical records. Sample surveys using MMP methods could be feasible for supplemental surveillance in other disease registries and population-monitoring systems whose timeliness and completeness are established. Doing so is often more cost-effective than developing new frames [12]. Moreover, the effort required to locate and contact sampled subjects may disclose problems in the routine operation of the underlying system and lead to its improvement.

Supporting information

S1 Appendix. AUC statistics by method and project area.

(DOCX)

Acknowledgments

We thank MMP participants, project area staff, and Provider and Community Board members. We also acknowledge the contributions of the Clinical Outcomes Team and the Behavioral and Clinical Surveillance Branch at CDC and the MMP Project Area Group Members.

Data Availability

Data cannot be shared without restrictions because they are collected under a federal Assurance of Confidentiality. Data are available from the US Centers for Disease Control and Prevention for researchers who meet the criteria for access to confidential data. Data requests may be made to the Clinical Outcomes Team in the Division of HIV/AIDS Prevention at the Centers for Disease Control and Prevention, 1-404-639-6475.

Funding Statement

Funding for the Medical Monitoring Project is provided by a cooperative agreement (PS15-1503) from the US Centers for Disease Control and Prevention (CDC). CDC has a contract with ICF International, Inc. for operational and technical support to conduct the Medical Monitoring Project The funder provided support in the form of salaries for all authors, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of the authors are articulated in the ‘author contributions’ section.

References

  • 1.McNaghten AD, Wolfe MI, Onorato I, et al. Improving the representativeness of behavioral and clinical HIV/AIDS surveillance in the United States: the rationale for developing a population-based approach. PLoS ONE. 2007; 2(6): e550: 1–7. 10.1371/journal.pone.0000550 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Centers for Disease Control and Prevention. Behavioral and clinical characteristics of persons with diagnosed HIV infection—Medical Monitoring Project, United States, 2015 cycle (June 2015–May 2016). HIV Surveillance Special Report 20. 2018. https://www.cdc.gov/hiv/pdf/library/reports/surveillance/cdc-hiv-surveillance-special-report-number-20.pdf. Accessed August 28, 2018.
  • 3.Frankel MR, McNaghten AD, Shapiro MF, et al. A probability sample for monitoring the HIV-infected population in care in the U.S. and in selected states. The Open AIDS Journal. 2012; 6, (Suppl 1: M2) 67–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Skarbinski J, Rosenberg E, Paz-Bailey G, et al. Human immunodeficiency virus transmission at each step of the care continuum in the United States. JAMA Intern. Med. 2015; 175(4): 588–596. 10.1001/jamainternmed.2014.8180 [DOI] [PubMed] [Google Scholar]
  • 5.Nakashima AK, Fleming PL. HIV/AIDS surveillance in the United States, 1981–2001. J Acquir Immune Defic Syndr. 2003; 32 Suppl 1:S68–S85. [PubMed] [Google Scholar]
  • 6.Iachan R, Johnson C, Harding RL, et al. Design and weighting methods for a nationally representative sample of HIV-infected adults receiving medical care in the United States–Medical Monitoring Project. Open AIDS J. 2016; 10: 164–181. 10.2174/1874613601610010164 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Institute of Medicine (2012). Monitoring HIV care in the United States: indicators and data systems. Washington, DC. [PubMed]
  • 8.Centers for Disease Control and Prevention. HIV Surveillance Report, 2016; vol. 28. http://www.cdc.gov/hiv/library/reports/hiv-surveillance.html. November 2017. Accessed August 28, 2018.
  • 9.Van den Brakel J, Zhang X, Tam S. Measuring discontinuities due to survey process redesigns. Statistics Netherlands (CBS) Discussion Paper 2017. https://www.cbs.nl/en-gb/background/2017/30/measuring-discontinuities-due-to-survey-process-redesigns. Accessed August 28, 2018.
  • 10.Deville J. and Sarndal C. 1992. Calibration estimators in survey sampling. J Am Stat Assoc. 1992: 87(418); 376–381. [Google Scholar]
  • 11.Centers for Disease Control and Prevention. Assurances of Confidentiality. https://www.cdc.gov/od/science/integrity/confidentiality/index.htm. Accessed October 9, 2018.
  • 12.Harding L, Iachan R, Johnson CH, Kyle T, Skarbinski J. Weighting methods for the 2010 data collection cycle of the Medical Monitoring Project. JSM Proceedings, Section on Survey Research Methods, 2013. Alexandria, VA: American Statistical Association; 3756–3764.
  • 13.Centers for Disease Control and Prevention. Methodologic changes in the Behavioral Risk Factor Surveillance System in 2011 and potential effects on prevalence estimates. MMWR Morb Mortal Wkly Rep. 2012;61:410–413. [PubMed] [Google Scholar]
  • 14.Kennet, J., & Gfroerer, J. (Eds.). (2005). Evaluating and improving methods used in the National Survey on Drug Use and Health (DHHS Publication No. SMA 05–4044, Methodology Series M-5). Rockville, MD: Substance Abuse and Mental Health Services Administration, Office of Applied Studies.
  • 15.Padilla M, Mattson CL, Scheer S, et al. Locating people diagnosed with HIV for public health action: utility of HIV case surveillance and other data sources. Public Health Rep. 2018. XX(X) 1–8. Accessed August 28, 2018. 10.1177/0033354918754541 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Valliant R, Dever JA, Kreuter F. Practical tools for designing and weighting survey samples. New York, NY: Springer; 2013. [Google Scholar]
  • 17.Cochran WG. Sampling Techniques. New York, NY: John Wiley & Sons; 1999. [Google Scholar]

Decision Letter 0

Omar Sued

5 Nov 2019

PONE-D-19-17946

Changes to the sample design and weighting methods of a public health surveillance system in order to include persons not receiving medical care

PLOS ONE

Dear Dr. Beer,

Thank you for submitting your manuscript to PLOS ONE. After a long and complex process to identify reviewers that could accept reviewing your article (we invited 20 reviewers that did not accept), based on the reviewer that revised your article we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please see attached the comments raised by the reviewer. 

We would appreciate receiving your revised manuscript by Dec 20 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Omar Sued, MD

Academic Editor

PLOS ONE

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please disclose the author affiliation to ICF International, Inc. in your competing interest statement.

* Thank you for stating the following in the Competing Interests section:

"The authors have declared that no competing interests exist.".

We note that one or more of the authors are employed by a commercial company: 'ICF International, Inc'.

  1. Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form.

Please also include the following statement within your amended Funding Statement.

“The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”

If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement.

2. Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc.  

Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to  PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and  there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

* Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

3.  We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

* In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This research aimed to describe methods used for computing survey weights, the approach chosen, and the benefits of using a dynamic surveillance registry as a sampling frame for adults living with HIV from the National HIV Surveillance System in the United States. While the authors have reported an excellent research topic, I believe that the following comments can be also helpful to consider in their revised version. What I believe this manuscript lacks are bare explanations of the methods used in the Methods section and bare presentation of the Results section. The authors should provide a detailed description and presentation of their findings.

Title:

I believe the title should show the area of this research. This is why I think we should have an “HIV” word somewhere like, “… in order to include persons not receiving HIV medical care”. As well, the title can be shorter. For example, “in order to include persons not receiving” can be written as “to also include persons not receiving”. The word “also” (or any other alternative) in this suggested title can reflect that the system had not included such individuals, but these should be also included along with those who receive HIV care.

Abstract:

When the authors claim that “weighting class adjustments and a single-stage nonresponse adjustment performed best,” they have to be more specific in what regard these methods turn to be the “best.” Or, “strongly associated with” does not reflect anything unless we see some outputs or even some key findings. Please avoid using such vague sentences and provide more evidence for these results.

The abstract does not seem to be a good place talking about how the results of this HIV case study might also have implications for “other disease registries”. This can be highlighted in the main body of the manuscript.

Introduction:

Paragraph 2:

a) In the second paragraph, second line: does not this sentence “Persons with undiagnosed HIV or diagnosed but receiving medical care” should not be looked like: “Persons with undiagnosed HIV or diagnosed but NOT receiving medical care.”?

b) In the same sentence, this sentence “account for most new HIV transmissions” should be supported by some quantitative data or statistics.

Paragraph3:

In this sentence, “with the number of AIDS cases,” did the author mean “HIV and AIDS cases” or only AIDS cases? Make sure that these two in the current era (and even the past) are (were) different. If they rewrite all HIV cases, this may include those individuals in AIDS phase, but the other way round does not include HIV cases. Caution with regard to the use of AIDS and HIV terms should be made throughout the text.

Paragraph 7 (line 98-99): “and eligibility criteria reflect these criteria” is not clear; reflect which criteria?

I believe the Introduction should have another paragraph, before the Materials and Methods, to address the main objectives of this research and highlight the gaps for the current sampling strategy. From the rest of the introduction section, we found that there were two main designs for sampling, one from 2005 to 2014, and the other from 2015 to the present. But, what has been as limitations or has not been considered very well in these two designs, especially the present one, should be highlighted in the last paragraph when objectives are also listed.

Material and Method section:

Is there any possibility to better demonstrate Table 1 with the real values or statistics obtained from this research on how a final weight was constructed? Or, the authors may consider showing these steps in an MS Excel file how to construct such weights.

I believe that the authors should explain the reason(s) for each weight in a clear way. For example, “design weight” is a way to deal with sampling error (i.e., happens when the selected sample does not fully and accurately reflect sampling frame); “nonresponse weight” is a way to deal with nonresponse error which id defined as … .

Design weight paragraph (line 114):

The authors stated that the first component of the weight (design weight) was calculated as the reciprocal of the probability of selection (meaning the number of individuals in the frame population represented by each sample). While this is true, the authors should pay attention to the fact that this is true for the time when weights are incorporated based on a simple random sampling approach (i.e., equal inclusion probability for each individual in the sample frame). However, this is not true when other sampling strategies such a stratified random sampling or clustered random sampling might be necessary. In HIV research, and even in the context of this research, clustering sampling (where health facilities might be considered as clusters) is more common. How would different sampling strategies be accounted in this weighting process?

The definition of nonresponse adjustment/weigh (i.e., the reciprocal of the probability of responding) without considering the mechanism of missingness is incomplete. The probability that one (i) responds to the survey given that they were sample (Pr|s) is different when we assume missing completely at random, or missing at random (I think this is what the authors selected as their mechanism, but there are no clear justifications in the manuscript), or not missing at random. Such a mechanism should be explained clearly in this manuscript.

Nonresponse adjustment paragraph:

Comment on the logistic regression model: a) absolute log odds ratio (|log(OR)|) probably means the absolute value of beta estimate (log odds ratio) obtained from logistic regression. This should be clearer; b) in this sentence, “to choose the most significant predictors”, I believe “significant predictors” in the context of statistics and epidemiology refers to statistically significant, while the authors used beta coefficients to choose the “strongest predictors” as these beta coefficients were ranked. This should be clarified; c) to find the strongest predictors (if this is the case for what the authors did in this step), using the unstandardized format of the variable cannot identify these strong predictors. The standardized version of the included variables should be used to identify such strong predictors. This is another consideration in this step.

To enrich the results and methods section of this piece, I would strongly recommend the authors to provide their statistical models for both multivariable logistic regression and propensity weighting approach. These can be provided in both the formula and their software codes (syntaxes). Consider to explain and provide them in an APPENDIX.

I am wondering if how the authors incorporated the score obtained from propensity score approach in the weighting process? I mean, whether they just included the continuous measure of the score in the model, or they used a categorical format of the score, or whether they inverse-probability-censoring weighting (IPCW) approach? If IPCW was used, was it a unstabilized (i.e., 1/Pr(r|covariates)) or a stabilized (Pr(r)/Pr(r/covariates); where r refers to respondents). The stabilized approach has been commonly recommended to be used in case this is the approach for weighting process. The authors should clarify this process. In case they only used propensity score, they should also explain why they did not use stabilized IPCW.

Using a propensity based approach requires several key assumptions. How did the authors check and satisfy the assumption? For example, one key assumption of propensity approach is “Positivity” or “common support”, which is for each value of X (which is a vector of covariates), there is a positive probability of being both exposed and unexposed (in this manuscript, both responded and non-responded): 0 < Pr(r = 1|covariates) < 1. Another assumption is “conditional independence”; how the authors checked this assumption? “Correct model specification” is another assumption. “Unmeasured confounders” is another assumption.: The model is corrected, given unmeasured confounders. Therefore, a detailed explanation of these assumptions is definitely required.

Noncoverage adjustment paragraph:

a) How the noncoverage adjustment/weight were calculated? Was it based on the % of one stratum in the “target population” divided by % of the same stratum in the sample, or the % of one stratum in the “frame population” divided by % of the same stratum in the sample? While the total number of target population (both diagnosed and undiagnosed individuals living with HIV) is unknown (regardless the fact that they can be estimated), I believe this should be clarified in this section that the frame sample was used to calculate this weight;

b) There is a model-based approach (called Raking) using the totals of HIV diagnosed population (assuming that this is the sample frame) to create post-stratification weights so that the marginal values of a categorical variable add up to the totals. Given the efficiency of this approach in including many strata in the process of post-stratification adjustment, I am wondering why this approach had not been incorporated into the weighting process to better improve the novelty of the approach.

Line 192: stratified PPS design should be spelled out as PPS (probability proportional to size).

Variance estimation:

How the author defined non-certainty PSUs? As well, what about certainty PSUs? These two types of PSUs require more explanations.

Table 1 is not well-linked with the text in the methods section. For example, the formula provided for the nonresponse adjustment stage in Table 1 should be clearly and accurately explained in the text. what are those elements in the numerator and what are those in the denominator? Consider explaining all these elements in the text.

Again, I strongly recommend the authors to provide a detailed process of the sampling when explaining each section of the methods and subsequently, the results section. Considering to have an MS Excel file (as an APPENDIX) explaining each step in a practical way is highly recommended. This can help the authors absorb more audiences who can evaluate their great research work, and probably improve it in the future. This can help the manuscript be an applied paper such that researchers in other fields (non-HIV fields) can also consider this approach. In each step they provide information, a comparison of the approaches (the one the authors recommended and the previous approach of sampling) is also recommended.

Results:

The author should provide an N for their sample frame; 9,700 persons were sampled from what N?

In Fig 2: both n and N are better to be reported in each step. For example, 9179/9700 (94.6%) were eligible; 5113/9179 (55.7%) of eligibles were contacted; 3654/9179 (39.8%) of eligibles responded. In addition, is there any possibility of adding ranges to each % in each step in this fig 2? If yes, I would also recommend to add them.

Results paragraph 2:

This sentence, “Participation was 71.5% nationally” is not in line with what can be seen from Fig 2. This is true that the conditioning populations are different, but I would recommend having the same numbers in both Table 2 and text. 71.5% was among those contacted, but this is misleading (an appreciably high %). This should be also reported by the total eligibles (which is 39.8%, with range ##, ##).

Results Nonresponse adjustment:

strongest predictors and the beta coefficients were identified and obtained using an unstandardized format of the variable (right?). If yes, this approach cannot identify the strong predictors as each predictor has its own specific unit (not consistent units). To identify the strong predictors, standardized approach should be used. This standardized approach is not typically recommended in health sciences, but for the case of this research and to identify factors that are strongly associated with response to the survey (not based on P-value, rather based on the magnitude of the estimate), standardized coefficients should be used.

“presumed HIV care status” should be better explained and defined as this is the single factor remained in the model to be a strong predictor of response. How “frequency of HIV lab results” was defined? Was it a count variable: 0 result, 1 result, 2 results, 3 results, and so one? What did 0 results indicate and what other results indicated.

Which “model fit statistics” were used to compare both weighting class and propensity model?

The authors chose the cell (class) weighting method “because, having less variance, it performed better than the propensity method.” This means nothing unless there are statistics and some findings to support such a claim. Without these statistics, we cannot evaluate or even trust why class weighting performed better.

The authors believe that class weighting was less variant: a) first less variant requires some quantitative supports, b) how much less was considered to be performing better? This is why I believe details should be reported as the journal does not have any limitation on the number of pages or words.

The whole results and methods sections are bare, meaning that the authors, unfortunately, did not provide details of their approaches to cover their methods, and required statistics and results to support their claim. A detailed explanation and description of the method section as well as detailed provision of results are required.

Results last paragraph:

“produced similar results” requires detailed statistics. What does “similar” mean in this context?

The authors believe that “Because implementing a two-stage adjustment was generally not feasible locally, we opted for a single-stage adjustment nationally”: is this also a new or innovative part of the results section? I am wondering which one of these two (single-stage or two stages) was performed in the previous cycles? As the two-stage was not feasible locally in this new version of the weighting, I think it had also not been feasible previously, right? So, this is not really a surprising result of this manuscript? Please provide explanations on this too.

Discussion and conclusion:

The authors should pay attention to the point that performing multiple weighting steps in surveys is to improve the REPRESENTATIVENESS of the selected sample. Given this, do the authors believe that a two-stage approach (when both noncontact and nonresponse were considered in computing weights) could not provide a more accurate estimate representative to the frame sample than a single-stage? I am wondering who much their findings and suggestions are based on the type of data they use (i.e., a data-driven approach) than the evidence supported by statistical theories (i.e., a theory-supported approach) that having multiple or sequential stages in weighting can better provide representative estimates? This should be also discussed in the discussion section.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Mostafa Shokoohi

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: review.docx

PLoS One. 2020 Dec 3;15(12):e0243351. doi: 10.1371/journal.pone.0243351.r002

Author response to Decision Letter 0


25 Feb 2020

We thank the review for their comments. We have edited the manuscript and feel it has been strengthened as a result. Our point-by-point response is below:

Reviewer: This research aimed to describe methods used for computing survey weights, the approach chosen, and the benefits of using a dynamic surveillance registry as a sampling frame for adults living with HIV from the National HIV Surveillance System in the United States. While the authors have reported an excellent research topic, I believe that the following comments can be also helpful to consider in their revised version. What I believe this manuscript lacks are bare explanations of the methods used in the Methods section and bare presentation of the Results section. The authors should provide a detailed description and presentation of their findings.

Title:

I believe the title should show the area of this research. This is why I think we should have an “HIV” word somewhere like, “… in order to include persons not receiving HIV medical care”. As well, the title can be shorter. For example, “in order to include persons not receiving” can be written as “to also include persons not receiving”. The word “also” (or any other alternative) in this suggested title can reflect that the system had not included such individuals, but these should be also included along with those who receive HIV care.

Response: We edited the title to incorporate these suggestions, although we feel that the phrasing “changes … to include” conveys that such persons were previously excluded.

We are unsure how to address some of the overarching comments, which are at times contradictory. For example, the reviewer suggests that the manuscript lacks “bare explanations” and “bare presentation of results,” but also advises us to provided “a detailed descriptions and presentation.” In our expositon we have attempted to provide a clear explanation of the process we followed without excessive detail, since we employed methods that are not in themselves novel and have been written about extensively (and in sufficient detail to inform those unfamiliar with the methods) by others. We feel that we have not omitted any important step or decision point.

Reviewer: Abstract:

When the authors claim that “weighting class adjustments and a single-stage nonresponse adjustment performed best,” they have to be more specific in what regard these methods turn to be the “best.” Or, “strongly associated with” does not reflect anything unless we see some outputs or even some key findings. Please avoid using such vague sentences and provide more evidence for these results.

Response: We justify these claims in the Results section and for brevity summarize them in the abstract, where we lack room to provide specifics of the comparison. We have edited the first sentence to tone down its comparative aspect by rephrasing it as “After assessing these methods, we chose as our preferred procedure weighting class adjustments and a single-stage nonresponse adjustment.” We then modified the following sentence to clarify that “Classes were constructed using variables associated with respondents’ characteristics and important survey outcomes, chief among them laboratory results available from surveillance that served as a proxy for medical care.”

Reviewer: The abstract does not seem to be a good place talking about how the results of this HIV case study might also have implications for “other disease registries”. This can be highlighted in the main body of the manuscript.

Response: We prefer to keep this sentence in the abstract because we feel that it is important to highlight the utility of the manuscript findings for other research, programs, and systems. It also emphasizes the value of supplemental surveillance (in which a sample survey leverages data from a disease registry) in public health practice.

Reviewer: Introduction:

Paragraph 2:

a) In the second paragraph, second line: does not this sentence “Persons with undiagnosed HIV or diagnosed but receiving medical care” should not be looked like: “Persons with undiagnosed HIV or diagnosed but NOT receiving medical care.”?

Response: This error has been corrected.

Reviewer: b) In the same sentence, this sentence “account for most new HIV transmissions” should be supported by some quantitative data or statistics.

Response: We added the specific estimates from the cited article.

Reviewer: Paragraph3:

In this sentence, “with the number of AIDS cases,” did the author mean “HIV and AIDS cases” or only AIDS cases? Make sure that these two in the current era (and even the past) are (were) different. If they rewrite all HIV cases, this may include those individuals in AIDS phase, but the other way round does not include HIV cases. Caution with regard to the use of AIDS and HIV terms should be made throughout the text.

Response: The sampling was based on AIDS cases and did not include persons with HIV infection who had not progressed to AIDS. This was because HIV non-AIDS diagnoses were not reportable in all U.S. jurisdictions at the time of sampling, early in the project.

Reviewer: Paragraph 7 (line 98-99): “and eligibility criteria reflect these criteria” is not clear; reflect which criteria?

Response: The sentence has been edited to specify the criteria, i.e., alive, diagnosed with HIV, aged 18 years or older, residing in the US.

Reviewer: I believe the Introduction should have another paragraph, before the Materials and Methods, to address the main objectives of this research and highlight the gaps for the current sampling strategy. From the rest of the introduction section, we found that there were two main designs for sampling, one from 2005 to 2014, and the other from 2015 to the present. But, what has been as limitations or has not been considered very well in these two designs, especially the present one, should be highlighted in the last paragraph when objectives are also listed.

Response: We have added a paragraph to the end of the introduction.

Reviewer: Material and Method section:

Is there any possibility to better demonstrate Table 1 with the real values or statistics obtained from this research on how a final weight was constructed? Or, the authors may consider showing these steps in an MS Excel file how to construct such weights.

I believe that the authors should explain the reason(s) for each weight in a clear way. For example, “design weight” is a way to deal with sampling error (i.e., happens when the selected sample does not fully and accurately reflect sampling frame); “nonresponse weight” is a way to deal with nonresponse error which id defined as … .

Response: We have added sentences, or added phrases to existing sentences, to say something about the purpose of each component of the weights described in this section.

The design weights account for unequal sampling fractions when a sample is something other than a simple random sample. Their use is warranted when the sample is deliberately selected in a way that causes it to depart from a proportionate representation of the frame (e.g. when sampling fractions vary by state in recognition of different population sizes. Our exposition lays out the adjustment stages we employed sequentially in constructing weights and the purpose of each.

Reviewer: Design weight paragraph (line 114):

The authors stated that the first component of the weight (design weight) was calculated as the reciprocal of the probability of selection (meaning the number of individuals in the frame population represented by each sample). While this is true, the authors should pay attention to the fact that this is true for the time when weights are incorporated based on a simple random sampling approach (i.e., equal inclusion probability for each individual in the sample frame). However, this is not true when other sampling strategies such a stratified random sampling or clustered random sampling might be necessary. In HIV research, and even in the context of this research, clustering sampling (where health facilities might be considered as clusters) is more common. How would different sampling strategies be accounted in this weighting process?

Response: The design weight is by definition the reciprocal of the probability of selection even when the sample is stratified or clustered. In those cases, the selection probabilities vary by stratum or cluster, and the design weights also vary by stratum or cluster. This is also the case in multi-stage sampling, which is relevant to our situation because the poulations of MMP sites vary greatly, while the sample sizes by project vary much less.

Cluster sampling by facilities is indeed common in HIV research, and in fact was part of the previous MMP sample design, before the transition to using surveillance registries as the frame. That design, the rationale for it, and the weighting methods we employed in conformance with the design, are described in other publications. We do not see the need to describe how we would account in our weighting process for a sampling strategy we did not employ.

Reviewer: The definition of nonresponse adjustment/weigh (i.e., the reciprocal of the probability of responding) without considering the mechanism of missingness is incomplete. The probability that one (i) responds to the survey given that they were sample (Pr|s) is different when we assume missing completely at random, or missing at random (I think this is what the authors selected as their mechanism, but there are no clear justifications in the manuscript), or not missing at random. Such a mechanism should be explained clearly in this manuscript.

Response: The Missing At Random (MAR) or Missing Completely At Random (MCAR) distinction is a foundational issue that even most technical discussions of implementing weighting methods ignore, and it does not seem appropriate to discuss a largely theoretical matter in describing our applied work. The use of nonresponse adjustments based on comparing the characteristics of respondents and nonrespondents is an implicit concession that (unit) response is not MCAR. By default, both the adjustment methods we considered were MAR.

Reviewer: Nonresponse adjustment paragraph:

Comment on the logistic regression model: a) absolute log odds ratio (|log(OR)|) probably means the absolute value of beta estimate (log odds ratio) obtained from logistic regression. This should be clearer; b) in this sentence, “to choose the most significant predictors”, I believe “significant predictors” in the context of statistics and epidemiology refers to statistically significant, while the authors used beta coefficients to choose the “strongest predictors” as these beta coefficients were ranked. This should be clarified; c) to find the strongest predictors (if this is the case for what the authors did in this step), using the unstandardized format of the variable cannot identify these strong predictors. The standardized version of the included variables should be used to identify such strong predictors. This is another consideration in this step.

Response: We agree that it is clearer to say that we chose “the strongest predictors” rather than “the most predictors,” and is also more consistent with the following sentence about ranked |log(OR)|s, and have edited the phrase. Our intent was not to establish epidmiologic significance but instead to choose among variables for a model. The issue of the standardized format of variables was not a concern for us because the predictors of response that we chose were overwhelmingly categorical, and in no case did we need to choose among continuous predictors

Reviewer: To enrich the results and methods section of this piece, I would strongly recommend the authors to provide their statistical models for both multivariable logistic regression and propensity weighting approach. These can be provided in both the formula and their software codes (syntaxes). Consider to explain and provide them in an APPENDIX.

Response: We assume that the reader has some familiarity with logistic regression, which underlies these methods, and do not feel that providing formulas would be helpful. (We did, for comparison, feel that providing formulas for the different weights applied was essential and helps the reader understand the components of the weights as well as the sequence and levels of analysis in the process). As for the software employed, the SAS and SUDAAN procedures are themselves straightforward applications, but because they are embedded in complex macros, they employ many macro variables and conditionally-executed code that would be difficult for someone new to the process to decipher without expending considerable effort.

Reviewer: I am wondering if how the authors incorporated the score obtained from propensity score approach in the weighting process? I mean, whether they just included the continuous measure of the score in the model, or they used a categorical format of the score, or whether they inverse-probability-censoring weighting (IPCW) approach? If IPCW was used, was it a unstabilized (i.e., 1/Pr(r|covariates)) or a stabilized (Pr(r)/Pr(r/covariates); where r refers to respondents). The stabilized approach has been commonly recommended to be used in case this is the approach for weighting process. The authors should clarify this process. In case they only used propensity score, they should also explain why they did not use stabilized IPCW.

Response: We grouped the propensities into categories (usually quintiles) to reduce their variability. This is mentioned in the “adjustment method” paragraph and referred to again in the “nonresponse adjustment” paragraph of the Results section.

We did not consider the IPCW approach, and are familiar with its use only in the context of survival analysis, not as a part of propensity weighting for survey nonresponse adjustment. Both the weighting class and propensity weighting methods assume a dichotomous outcome rather than different mathematical models that incorporates a time-to-event covariate. Nor do we have censored data (unless one were to treat the end of the field period as the censoring event, in which case only those who were contacted and refused were true non-censored nonrespondents, but we are unaware of such an approach being applied in nonresponse adjustment for survey weighting).

Reviewer: Using a propensity based approach requires several key assumptions. How did the authors check and satisfy the assumption? For example, one key assumption of propensity approach is “Positivity” or “common support”, which is for each value of X (which is a vector of covariates), there is a positive probability of being both exposed and unexposed (in this manuscript, both responded and non-responded): 0 <Pr(r = 1|covariates) < 1. Another assumption is “conditional independence”; how the authors checked this assumption? “Correct model specification” is another assumption. “Unmeasured confounders” is another assumption.: The model is corrected, given unmeasured confounders. Therefore, a detailed explanation of these assumptions is definitely required.

Response: The reviewer brings up basic mathematical concepts underlying logistic regression models in general. The assumptions of conditional independence and unmeasured confounders are not required for the weighting class method, which equates to a cell-means model in regression. The classes are constructed in such a way that the number of respondents and nonrespondents is not small, so satisifying the probability of being either exposed or unexposed would be automatic. Ours was a routine application of methods that are applied frequently within survey practice, and nothing unique about our situation would preclude the use of either technique. Because of that, we see no reason to explain the underlying theory.

Reviewer: Noncoverage adjustment paragraph:

a) How the noncoverage adjustment/weight were calculated? Was it based on the % of one stratum in the “target population” divided by % of the same stratum in the sample, or the % of one stratum in the “frame population” divided by % of the same stratum in the sample? While the total number of target population (both diagnosed and undiagnosed individuals living with HIV) is unknown (regardless the fact that they can be estimated), I believe this should be clarified in this section that the frame sample was used to calculate this weight;

Response: This is explained in the second paraphgraph, when we refer to the updated counts and post-stratifying so that totals match. The corresponding formula in the table shows that this adjustment is the ratio of totals.

Reviewer: b) There is a model-based approach (called Raking) using the totals of HIV diagnosed population (assuming that this is the sample frame) to create post-stratification weights so that the marginal values of a categorical variable add up to the totals. Given the efficiency of this approach in including many strata in the process of post-stratification adjustment, I am wondering why this approach had not been incorporated into the weighting process to better improve the novelty of the approach.

Response: Among calibration methods, survey often employ raking when external control totals are used as to quantify the target population, particularly when these external controls are derived from independent sources. We used only auxiliary variables from the surveillance system, and thus were able to calculate cells in the cross-tabulation of post-stratification variables directly rather than having to estimate them from marginal proportions, as is the case with raking, which would risk increasing bias for some demographic subgroups. Rather than employing novel approaches for their own sake, we chose throughout our work to apply standard techniques consistent with past practice. Choosing unobjectionable, defensible methods while maintaining as much methodological continuity as was reasonable motivated us, given the many other changes to the sample design.

Reviewer: Line 192: stratified PPS design should be spelled out as PPS (probability proportional to size).

Response: This has been edited.

Reviewer: Variance estimation:

How the author defined non-certainty PSUs? As well, what about certainty PSUs? These two types of PSUs require more explanations.

Response: Selection of PSUs was described previously, in the first paragraph of the section titled “2005 – 2014 population, frame, and sample design.” We have added a phrase there to clarify that some PSUs were sampled with certainty.

Reviewer: Table 1 is not well-linked with the text in the methods section. For example, the formula provided for the nonresponse adjustment stage in Table 1 should be clearly and accurately explained in the text. what are those elements in the numerator and what are those in the denominator? Consider explaining all these elements in the text.

Response: The order of the narrative in this part of the text parallels the elements of the table as well as the stages of adjustment that are applied sequentially. Each component we describe is itself a standard application of the form of adjustment applied, and the paragraph describing that component summarizes its purpose. The table uses notation to clarify which group is subject to the adjustments via subscripts and set inclusion notation. For these reasons, we feel that adding to the narrative to describe every element of every formula would decrease readability for the reader without making it clearer.

Reviewer: Again, I strongly recommend the authors to provide a detailed process of the sampling when explaining each section of the methods and subsequently, the results section. Considering to have an MS Excel file (as an APPENDIX) explaining each step in a practical way is highly recommended. This can help the authors absorb more audiences who can evaluate their great research work, and probably improve it in the future. This can help the manuscript be an applied paper such that researchers in other fields (non-HIV fields) can also consider this approach. In each step they provide information, a comparison of the approaches (the one the authors recommended and the previous approach of sampling) is also recommended.

Response: Sampling in MMP is unusually simple for a national survey. As described in the section titled “2015 – present population, frame, and sample design,” in the paragraph following Figure 1, CDC staff draws simple random samples from each of the 23 separate frame files. There is no further detail to provide.

We appreciate the reviewer’s kind remarks complimenting our work and concern for popularizing it. However, researchers wishing to apply our work to their own situations, which is indeed our hope, will need considerably more background than a manuscript such as this, and we have provided the references we think would be most helpful for acquiring such background (including other publications describing MMP sampling and weighting methods, allowing us to focus in the present manuscript on what is new since the recent changes to the MMP population and frame). We are unsure how providing this information in spreadsheet form would be helpful; much of that, it seems to us, would repeat the table with weight components and accompanying narrative section.

Reviewer: Results:

The author should provide an N for their sample frame; 9,700 persons were sampled from what N?

Response: We have added this information.

Reviewer: In Fig 2: both n and N are better to be reported in each step. For example, 9179/9700 (94.6%) were eligible; 5113/9179 (55.7%) of eligibles were contacted; 3654/9179 (39.8%) of eligibles responded. In addition, is there any possibility of adding ranges to each % in each step in this fig 2? If yes, I would also recommend to add them.

Response: We feel that adding numerators and denominators for each proportion mentioned in the figure is not needed because the denominator is easily inferred from the preceding step, and the reader can easily reproduce the proportions cited. Adding ranges across states would make it harder to follow.

Reviewer: Results paragraph 2:

This sentence, “Participation was 71.5% nationally” is not in line with what can be seen from Fig 2. This is true that the conditioning populations are different, but I would recommend having the same numbers in both Table 2 and text. 71.5% was among those contacted, but this is misleading (an appreciably high %). This should be also reported by the total eligibles (which is 39.8%, with range ##, ##).

Response: In the sentence referred to, we define participation as response among those contacted. This is ratio of the 3,654 eligibles responded and the 5,113 eligibles contacted (3654/5113=0.7146=71.5%. We have edited the sentence to provide these two counts. We changed the term “participation” to “cooperation,” so that the rate we present is consistent with Cooperation Rate 2 (COOP2) as defined in AAPOR’s standard definitions.

Reviewer: Results Nonresponse adjustment:

strongest predictors and the beta coefficients were identified and obtained using an unstandardized format of the variable (right?). If yes, this approach cannot identify the strong predictors as each predictor has its own specific unit (not consistent units). To identify the strong predictors, standardized approach should be used. This standardized approach is not typically recommended in health sciences, but for the case of this research and to identify factors that are strongly associated with response to the survey (not based on P-value, rather based on the magnitude of the estimate), standardized coefficients should be used.

Response: To our knowledge, standardized coefficients are not commonly used in this particular situation of choosing the strongest predictors of survey response among categorical predictors. Our models overwhelmingly led to the use of categorical variables, so standardizing coefficients was not a concern. In no case were we in the situation of choosing between even two continuous variables, in which case standardizing their scales might have informed the comparison.

Reviewer: “presumed HIV care status” should be better explained and defined as this is the single factor remained in the model to be a strong predictor of response. How “frequency of HIV lab results” was defined? Was it a count variable: 0 result, 1 result, 2 results, 3 results, and so one? What did 0 results indicate and what other results indicated.

Response: We have edited the text to clarify that (while we did indeed have a count variable) we used a three-level indicator for any care; this was measured by the presence or absence of HIV lab results in the person’s surveillance records over the past 12 months: 2+ HIV labs in the past 12 months 90 days or more apart, at least 1 HIV lab in the past 12 months, and 0 HIV labs in the past 12 months.

Reviewer: Which “model fit statistics” were used to compare both weighting class and propensity model?

The authors chose the cell (class) weighting method “because, having less variance, it performed better than the propensity method.” This means nothing unless there are statistics and some findings to support such a claim. Without these statistics, we cannot evaluate or even trust why class weighting performed better.

The authors believe that class weighting was less variant: a) first less variant requires some quantitative supports, b) how much less was considered to be performing better? This is why I believe details should be reported as the journal does not have any limitation on the number of pages or words.

The whole results and methods sections are bare, meaning that the authors, unfortunately, did not provide details of their approaches to cover their methods, and required statistics and results to support their claim. A detailed explanation and description of the method section as well as detailed provision of results are required.

Response:

We are including, as an appendix of the revised manuscript, a table of AUC statistics. These aided us in evaluting the methods compared and quantify the sort of differences the reviewer mentions. We had no formal threshold for “how much less was considered better” because this was only one of several factors we considered when comparing the two methods.

As mentioned previously, because ours is a standard application of methods that have been extensively used and written about, we feel that we have provided sufficient detail (and references for those who may require more background).

Reviewer: Results last paragraph:

“produced similar results” requires detailed statistics. What does “similar” mean in this context?

The authors believe that “Because implementing a two-stage adjustment was generally not feasible locally, we opted for a single-stage adjustment nationally”: is this also a new or innovative part of the results section? I am wondering which one of these two (single-stage or two stages) was performed in the previous cycles? As the two-stage was not feasible locally in this new version of the weighting, I think it had also not been feasible previously, right? So, this is not really a surprising result of this manuscript? Please provide explanations on this too.

Response: We have provided additional information in an appendix relevant to this judgment (which was, admittedly, based on general impressions and not purely quantative). Also, the remainder of this paragraph describes other considerations that were ultimately greater concerns.

The multi-stage approach was not performed (nor considered) in the previous cycles, and thus there is no historical comparison to make. Because of the clustered design, it would have been difficult, and probably infeasible, before.

Reviewer: Discussion and conclusion:

The authors should pay attention to the point that performing multiple weighting steps in surveys is to improve the REPRESENTATIVENESS of the selected sample. Given this, do the authors believe that a two-stage approach (when both noncontact and nonresponse were considered in computing weights) could not provide a more accurate estimate representative to the frame sample than a single-stage? I am wondering who much their findings and suggestions are based on the type of data they use (i.e., a data-driven approach) than the evidence supported by statistical theories (i.e., a theory-supported approach) that having multiple or sequential stages in weighting can better provide representative estimates? This should be also discussed in the discussion section.

Response: We have added sentences to the relevant paragraph in the discussion section to address this insightful point. In most of this kind of work, there is often a tension between theoretical considerations and data-driven approaches, and that was certainly the case here.

Attachment

Submitted filename: Response to review 2020 01 31.docx

Decision Letter 1

Mohammad Asghari Jafarabadi

26 Oct 2020

PONE-D-19-17946R1

Changes to the sample design and weighting methods of a public health surveillance system to also include persons not receiving HIV medical care

PLOS ONE

Dear Dr. Linda Beer,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

Please submit your revised manuscript by 2020/11/7. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Mohammad Asghari Jafarabadi

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: No

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: Please find the enclosed file for the comments. Thx

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Comments to the authors PONE-D-19-17946_R1.docx

PLoS One. 2020 Dec 3;15(12):e0243351. doi: 10.1371/journal.pone.0243351.r004

Author response to Decision Letter 1


10 Nov 2020

Thank you for the opportunity to revise this manuscript. Below is a point-by-point response to the reviewer’s comments.

Comments to the authors:

This study is about calculating survey weights, the approach chosen, and the benefits of using a dynamic surveillance registry as a sampling frame, an exciting study. However, many shortcomings should be addressed/implemented.

Here are my detailed comments with special attention in methodological parts:

ABSTRACT

• State the design of the study clearly.

• MMP is described in the Objectives paragraph of the abstract as a surveillance system. Elsewhere in the abstract, we refer to its sample frame and design. The focus of our paper is on weighting methods, so we devote most of the text of the abstract to describing components of the weights and our comparison procedure, with references to study decide provided as needed in the text to motivate the discussion of weighing.

• State the source of subjects stated.

• The Methods paragraph of the abstract, first sentence, identifies the HIV case surveillance registry as the source of study subject.

• Present the results using suitable statistical measures with CI's.

• Because this was a comparison of different weighting methods, results are not well summarized by particular statistical measures and their associated CIs. Unlike manuscripts with a single outcome (or even a small number of key outcomes), this manuscript is different in that we summarize methods for calculating weights use in all analyses of MMP data, whichever outcome might be the focus.

• State the implications of key findings as conclusions based on the results. The last sentence could not be concluded thru the obtained results.

• The last sentence is a recapitulation of the definition of supplemental surveillance. We have added text after that to clarify that the more-detailed information (the supplemental information), which would be cost-prohibitive to obtain from all subjects, can instead be collected through a sample survey of a portion of subjects, which is the method described in the paper.

INTRODUCTION

• Explain the gap of knowledge and necessity of the study precisely.

• We have added two sentences to the introduction clarifying that this extended discussion of weighting in MMP has not previously been available, and how this information can inform other studies that use sample survey methods.

METHODS

• Divide the methods section into determined subsections with defined subheadings according to the STROBE statement.

• Most elements of the STROBE rubric are not well suited to the kind of methodologic work that is the basis of this manuscript; they are, rather, more appropriate for reporting results of individual epidemiologic studies (such as particular instances of topic- or subpopulation-specific analyses of MMP topics). Instead, as appropriate, we have provided response rates that are defined by CASRO, which recommends their inclusion and citation in manuscripts. These fundamental concerns of survey operations are a more appropriate framework for reporting on methodological details such as these.

• State the design of the study and its key elements clearly.

• The Methods section describes these in great detail.

• State the setting of the study clearly.

• We describe both the national study population and the states participating, which themselves constitute populations, as well as the roles of the CDC and the state and local health departments involved.

• State the relevant dates of study, including periods of recruitment and data collection.

• We provide the study reference period, the data collection cycle, and other timing consideration for this ongoing surveillance project.

• State the source of subjects.

• We describe the study population and the frame, derived from surveillance records, used to represent it.

• State the number of subjects.

• We provide both frame and sample sizes in the Results section.

• Explain the sampling design and procedures in detail.

• The sampling design is explained in considerable detail, as is necessary to motivate the discussion of the design weight stage of weighting and variance estimation procedure and how these are consistent with the sample design.

• State the study variables (including outcomes, predictors, and potential confounders).

• We have described the relevant outcomes (noncontact, nonresponse) at each stage of the weighting process and the predictors considered, as well as the rationale for considering a variable as a predictor. Confounding is not really a concern in the nonresponse modeling process for the weighting approaches we considered.

• State the name, version, and address of the software.

• Although most calculations were carried out using SAS, such specifics seem not especially relevant, since we are describing procedures that could be implemented in almost any statistical package.

• Describe the statistical analytical methods taking account of the sampling strategy.

• Components of the analysis weights follow the sampling design at every stage, and these correspondences are thoroughly explained in the exposition.

• Describe the sensitivity analyses clearly.

• As with confounding, sensitivity analysis is not applicable for the weighting methods considered.

• Mention the ethics code.

• MMP is considered public health practice and is exempt from human subjects review, but undergoes IRB review routinely where local regulation require it. This is described on pages 13-14.

RESULTS

• Present the characteristics of study participants (e.g., demographic, clinical, and social) and information on exposures and potential confounders.

• In this methodological work, we have described how information on sampled subjects is used in making population-level estimates. Characterizing only study participants (respondents) would be insufficient, since weighting methods (which we describe) compare the characteristics of both respondents and nonrespondents to compensate for nonresponse bias. The scope of topics included in MMP is broad, and exposures and potential confounders appropriate to consider would depend on the particular outcome under study, which is not the purpose of this manuscript.

• Present the table of demographic characteristics.

• Because the purpose of this manuscript is to describe weighting methods, rather than characterizing the sample or the study population, we describe which characteristics are considered as weighting factors, along with the rationale (the conditions they must satisfy to be included in the weighting). Reference 2 in the manuscript describes the demographic characteristics of MMP respondents.

• Summarize and describe the outcome measures of the study with suitable statistical measures and their CI.

• Because the focus of this manuscript is weighting methods, one might think of contact and response as outcomes, and we have described how these are incorporated into the weighting process. Otherwise, this point seems not entirely relevant to the topic of weight construction.

DISCUSSION

• Discuss the generalizability (external validity, applicability) of the trial findings that.

• This comment is unclear – was the sentence truncated? This manuscript describes weighting methods and is not a reporting of trial findings. We do note in the “Public Health Implications” section that sample surveys using MMP methods could be feasible for supplemental surveillance in other disease registries and population-monitoring systems whose timeliness and completeness are established.

• Recommend further studies in the future.

• No further refinements to the MMP sample design or to the weighting procedures currently employed are currently being considered, although project staff continue to monitor response rates during data collection, as well as predictors of response during the annual weighting process, in case any refinements may become necessary.

Decision Letter 2

Mohammad Asghari Jafarabadi

16 Nov 2020

PONE-D-19-17946R2

Changes to the sample design and weighting methods of a public health surveillance system to also include persons not receiving HIV medical care

PLOS ONE

Dear Dr. Beer,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Dec 31 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Mohammad Asghari Jafarabadi

Academic Editor

PLOS ONE

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: The comments have been well addressed, some minor points:

1- In the abstarct and methods section add the design of the shtudy as "methodological study".

2- A description of participnats' profile (although the authors are not agreed to add the demographic table), should be explained at the begiing of the results section.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Dec 3;15(12):e0243351. doi: 10.1371/journal.pone.0243351.r006

Author response to Decision Letter 2


17 Nov 2020

Thank you for the review. Our responses to the reviewer’s comments are listed below.

Reviewer #2: The comments have been well addressed, some minor points:

1- In the abstarct and methods section add the design of the shtudy as "methodological study".

Response: This has been added to the abstract and methods section (lines 35-36 and 119 in tracked changes version of manuscript).

2- A description of participnats' profile (although the authors are not agreed to add the demographic table), should be explained at the begiing of the results section.

Response: A brief description of the participant characteristics and a reference to the document in which they are described in full has been added to the results section (lines 222-226 in tracked changes version of manuscript).

Decision Letter 3

Mohammad Asghari Jafarabadi

20 Nov 2020

Changes to the sample design and weighting methods of a public health surveillance system to also include persons not receiving HIV medical care

PONE-D-19-17946R3

Dear Dr. Beer,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Mohammad Asghari Jafarabadi

Academic Editor

PLOS ONE

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: Thanks for revising the manuscript properly.

....................................................................

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Acceptance letter

Mohammad Asghari Jafarabadi

23 Nov 2020

PONE-D-19-17946R3

Changes to the sample design and weighting methods of a public health surveillance system to also include persons not receiving HIV medical care

Dear Dr. Beer:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Professor Mohammad Asghari Jafarabadi

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. AUC statistics by method and project area.

    (DOCX)

    Attachment

    Submitted filename: review.docx

    Attachment

    Submitted filename: Response to review 2020 01 31.docx

    Attachment

    Submitted filename: Comments to the authors PONE-D-19-17946_R1.docx

    Data Availability Statement

    Data cannot be shared without restrictions because they are collected under a federal Assurance of Confidentiality. Data are available from the US Centers for Disease Control and Prevention for researchers who meet the criteria for access to confidential data. Data requests may be made to the Clinical Outcomes Team in the Division of HIV/AIDS Prevention at the Centers for Disease Control and Prevention, 1-404-639-6475.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES