Skip to main content
BMC Infectious Diseases logoLink to BMC Infectious Diseases
. 2018 Dec 4;18:617. doi: 10.1186/s12879-018-3536-4

Validation of CD4+ T-cell and viral load data from the HIV-Brazil Cohort Study using secondary system data

Alex Jones Flores Cassenote 1,, Alexandre Grangeiro 2, Maria Mercedes Escuder 3, Jair Minoro Abe 4, Aluísio Augusto Cotrim Segurado 1,5
PMCID: PMC6278123  PMID: 30514215

Abstract

Background

The HIV-Brazil Cohort Study (HIV-BCS) is a research primarily based on data collection from medical records of people living with HIV/AIDS in Brazil. The aim of this study was to present the validating design and results for the laboratory biomarkers viral load and CD4+ T-cell count from the HIV-Brazil Cohort Study.

Methods

A total of 8007 patients who were started cART from 2003 to 2013 were considered eligible for this study. Total follow-up time was 32,397 years. The median duration of follow-up was 3.51 years (interquartile range - IQR 1.63–6.13 years; maximum 11.51 years). We used secondary data from the Brazilian Laboratory Tests Control System (SISCEL). Incidence of lab testing rates per 100 person years (100 py) were used to compare the number of laboratory tests carried out among cohort sites considering different databases for CD4+ T-cell counts and HIV viral load assessments. Descriptive statistics including 95% confidence interval, Pearson correlation coefficient, Bland-Altman agreement analysis and kappa coefficient agreement were applied for analysis.

Results

A total of 80,302 CD4+ T-cell counts and 79,997 HIV viral load assessments were observed in HIV-BCS versus 94,083 CD4+ T-cell counts and 84,810 viral loads from the Brazilian Laboratory Tests Control System. The general CD4+ T-cell HIV-BCS testing rate was 247 per 100 py versus 290 per 100 py and the viral load HIV-BCS testing rate was 246 per 100 py versus 261 per 100 py. The general correlation observed for the lowest quantitative CD4+ T-cell count before cART was 0.970 (p < 0.001) and for the log of the highest viral load before cART was 0.971 (p < 0.001). The general agreement coefficient for categorized CD4+ T-cell count was 0.932 (p < 0.001) and for viral load was 0.996 (p < 0.001).

Conclusions

The current study confirms that biomarkers CD4+ T-cell count and viral load from the HIV-BCS have a high correlation and agreement with data from SISCEL, rendering both databases reliable and useful for epidemiological studies on HIV care in Brazil.

Electronic supplementary material

The online version of this article (10.1186/s12879-018-3536-4) contains supplementary material, which is available to authorized users.

Keywords: HIV-Brazilian cohort study, TCD4+ − cell, Viral load, Validation, Medical records

Background

HIV/AIDS cohort studies based on clinic populations and medical records are becoming more abundant due in part to an increasing trend toward electronic medical records and advances in information technology [1]. In the absence or difficulty to obtain prospectively collected clinical data, the epidemiological studies of HIV infection often rely on a variety of secondary sources for patient information, including the patient report, medical records, and surveillance data [2].

Brazil is estimated to have 830,000 people living with HIV and, following national guidelines issued by the Ministry of Health, all should be treated with combined antiretroviral therapy (cART), provided free of charge through the public health sector. In this scenario, the HIV-Brazil Cohort Study (HIV-BCS) is being carried out as a nationwide research project, primarily based on data collection from medical records of people living with HIV/AIDS. Due to its long follow-up period and a significant number of observations, this study is recognized as an important asset to increase the availability of data on outcomes of the National AIDS Program related to the prescription of cART in public healthcare services [3].

In Brazil, AIDS epidemiological surveillance, apart from being based on information provided through the notification of cases recorded on the Notifiable Diseases Information System (Sistema de Informação de Agravos de Notificação - SINAN) and deaths recorded on the Mortality Information System (Sistema de Informação sobre Mortalidade - SIM), also draws information from two other systems: the Laboratory Tests Control System (Sistema de Controle de Exames Laboratoriais - SISCEL) and the Medication Logistics Control System (Sistema de Controle Logístico de Medicamentos - SICLOM). These databases comprise the basis of the National HIV/AIDS register in Brazil [4].

The SISCEL system has been developed with the aim of monitoring CD4 T lymphocyte (CD4+ T-cell) counts and HIV viral load (VL) assessments, biomarkers that are used to decide when patients should start treatment and to monitor patients already under antiretroviral therapy [4].

So far little effort has been made to correlate multiple sources of data [2]. Comparisons of medical records and HIV/AIDS surveillance data found good agreement for individual data [57]. A study performed in North Carolina tried to describe and quantify differences during the year of the first positive HIV test among patient reports, medical records, and HIV/AIDS surveillance data and concluded these measures could not reliably be used interchangeably as there was wide variability between them. Although the collection of data from patient reports or existing sources is convenient, cost-effective and efficient, there is significant variability among all sources [2]. Another recent research proposed to compare measures of retention in HIV care status based on both clinic visit data and HIV laboratory surveillance data. Although the authors have pointed out important limitations associated with definitions being used for retention, they concluded the combined use of laboratory and clinic visit-based data to measure retention in care provides a more accurate representation of the care status of HIV-infected patients than use of a single data source alone [8].

The assessment of the quality of data extracted from medical records is vital to correctly interpret the results obtained in the HIV-BCS and should help indicate critical points and open the door to further scientific production. The aim of this study was to present the validating design and results for the laboratory biomarkers CD4+ T-cell counts and HIV VL assessments from the HIV-BCS, using secondary data from SISCEL.

Methods

HIV-Brazil cohort study

HIV-BCS is an ambidirectional cohort involving 13 Brazilian sites, comprising 26 public health facilities in 11 cities across four of the five administrative regions of the country. Patients aged over 18 who were started on cART from 2003 to 2013 are enrolled in this cohort. The facilities were selected based on convenience, by region and city of location, availability of information on the clinical follow-up and the use of cART, and the existing infrastructure to conduct studies of this nature. The cities in which these facilities are located were chosen because they reflect the diversity of the epidemiological profile of AIDS in Brazil. Information about cohort sites, eligibility and inclusion criteria, data sources, outcomes, censoring criteria, availability of data and ethics statement have been previously reported [3].

The cohort data were obtained as part of the routine clinical care provided at the health services (routine-care-based cohort) and were abstracted from patients’ clinical records by trained abstractors onto standardized forms. These clinical records were reviewed at intervals not exceeding 6 months to investigate events recorded during the routine clinical follow-up visits performed within each period.

Different data collection strategies were used in the HIV-BCS cohort study: phase 1 –retrospective cohort– a standardized form was applied on patient medical reports and data entered onto a specific EpiData 3.1 form (The EpiData Association, Odense, Denmark); phase 2 – prospective cohort – an online standardized form was used and data entered directly from patients’ medical records into a REDCap (Research Electronic Data Capture, Nashville, United State) file; additionally, the IPEC site (retrospective and prospective) included data from their local cohort and electronic medical records after inclusion of all their patients who met the specific criteria in the study (Fig. 1).

Fig. 1.

Fig. 1

The structure of the HIV-BCS database with the different databases in different phases of the study (Phase 1 is retrospective and Phase 2 is prospective)

For this evaluation, we used the following variables: patient code, historical laboratory results including CD4+ T-cell counts (number of cells per mm3) and VL assessments (absolute number of HIV RNA copies/ml and log of number of copies/ml) and date of blood sample collection.

Brazilian laboratory test control system–SISCEL

Since 1997, the Department of Sexually Transmitted Disease and AIDS started to deploy the National Network of Laboratories (NNT) for carrying out T-cell counts (CD4+/CD8+) and HIV VL assessments. The main objective of the network is to monitor the course of HIV infection, guide the initiation of antiretroviral therapy and to provide immunological parameters for the prescription of chemoprophylaxis for opportunistic infections [4].

SISCEL was implemented in all Brazilian states in 2002, and its billing module enables laboratories affiliated to NNT to generate all the information required by the Ministry of Health for billing. Currently, SISCEL is being used in all Brazilian states, with 95 laboratories performing CD4+/CD8+ T-cells counts and 85 laboratories performing VL tests [4].

Information is fully stored in the central database in the Department of Sexual Transmitted Disease, AIDS and Viral Hepatitis of the Ministry of Health. This database is automatically fed by NNT laboratories and can be accessed by federal, state and municipal managers of STD and AIDS programs, using the Internet with data encryption (Fig. 2). There are two subsystems, one for VL assessments and another for CD4+/CD8+ T-cell counts. The SISCEL’s logical framework can be seen more detail in another study [5].

Fig. 2.

Fig. 2

SISCEL operating system used by National Network of Laboratories (NNT) to feed the database with costs and laboratory results of T-cell counts (CD4+/CD8+) and HIV VL assessments

For this evaluation, we used following variables: patient code, date of birth, requesting institution, request date, sample collection date, result date, CD4+ T-cell and CD8+ T-cell count, total lymphocyte count, VL copy number and log, and qualitative result of VL (above or below assay lower limit).

Matched system

A single code was generated for each patient in the different databases used in the HIV-BCS study. For patients enrolled in phase 1 and only updated during follow-up in phase 2, a specific field was created in the online REDcap form to enable entering the patient code used in the retrospective phase.

The concatenation between the HIV-BCS database and the National HIV/AIDS register database was a major challenge because the common identification variables in the two databases were: patient’s name, the patient’s mother’s name and date of birth. Thus, the bubble sort computer system was developed based on C# language to generate unique patient codes for cases that were 100% compatible. Additionally, a researcher evaluated the records that did not match to identify potential problems (for example, names incorrectly typed). The creation of this single registration enables a linkage with any database from the National HIV/AIDS register.

Inclusion, censoring criteria and ethical aspects

For this specific paper we have adopted an inclusion criteria similar to the HIV-BCS [3] but some subjects were excluded (Figs. 1 and 3) no linkage code with National HIV/AIDS register surveillance (1.1% or 100 individual); (2) no data available for CD4+ T-cell count or HIV VL assessment during clinical follow-up (1.6% or 136 individuals); or (3) no data available for CD4+ T-cell count or VL after initiating cART (4.9% or 431 individuals).

Fig. 3.

Fig. 3

Final sample size flowchart. (1) no linkage code with National HIV/AIDS register surveillance; (2) no data available for CD4+ T-cell count or HIV viral load assessment during clinical follow-up; or (3) no data available for CD4+ T-cell count or viral load after initiating cART. Analysis 1 refers to descriptive statistics including mean, median, standard deviation (SD) and interquartile range (IQR) and proportions (%) for the ten first quantitative measures of CD4+ T-cell counts and categorized viral loads (above or below assay lower limit) after cART; analysis 2 including Pearson correlation coefficient and Bland-Altman agreement analysis for quantitative measures of lowest CD4+ T-cell count and log of highest viral load before cART; and analysis 3 is a Kappa coefficient agreement for categorized measures of lowest CD4+ T-cell count before cART and highest viral load before cART

The censoring date considered in the HIV-BCS was July 31, 2014 for patients that starting cART from January 01, 2003 to December 31, 2013. In addition, we used the dates of the last CD4+ T-cell counts and VL to censure SISCEL database and prevent entry of tests results beyond the date recorded in the cohort study.

In order to work with possible discrepancies between tests due to the lack of matching the exact tests dates (CD4+ T-cell or VL), the comparisons were made with another result of the same test considering the closest date, respecting an interval of 30 days.

HIV-BCS was approved by the Institutional Review Boards (IRB) of the participating sites: in the first phase, the IRB waived the requirement for written informed consent, given that confidentiality of the individual’s data was ensured at all stages of the project. In the second phase, all participants provided written consent for participation in the study. This specific study was approved by the Ethics Committee of the Medical School of the University of São Paulo (#229/13).

Statistical analysis

The analysis used in this study aimed to summarize updated information from HIV-BCS and to show the validating process based on matching CD4+ T-cell counts and VL assessments between the HIV-BCS database and the SISCEL database. In the first approach (considering sample size showed Fig. 3 – Analysis 1), central tendency and dispersion statistics are used to characterize the cohort follow-up; incidence of lab testing rates per 100 person years (100 py) were used to compare the number of laboratory tests carried out among cohort sites considering different databases. In addition, proportion and 95% confidence intervals based on 1000 bootstrap samples [9] for qualitative lowest CD4+ T-cell before cART and higher VL before cART were studied. For the second approach three analysis strategies were used (Fig. 3 – Analysis 2 and 3):

  1. Descriptive statistics [10, 11] including mean, median, standard deviation (SD) and interquartile range (IQR) and proportions (%) for the ten first quantitative measures ofCD4+ T-cell counts and categorized VLs (above or below assay lower limit) after cART;

  2. Pearson correlation coefficient [12] and Bland-Altman agreement analysis [13] for quantitative measures of lowest CD4+ T-cell count and log of highest VL before cART;

  3. Kappa coefficient agreement [14] for categorized measures of lowest CD4+ T-cell count before cART (< 200 cell/mm3, 200 |-- 350 cell/mm3, ≥ 350 cell/mm3) and highest VL before cART (above or below assay lower limit);

Qualitative or categorized measures are based in quantitative measures considering de following: for the lowest CD4+ T-cell before cART (< 200 cell/mm3, 200 |-- 350 cell/mm3, ≥ 350 cell/mm3) and for the higher VL before cART (above or below assay lower limit).

The database was analyzed with the Statistical Package for the Social Sciences (SPSS) 24 for Windows (International Business Machines Corp, New York, USA) and R version 3.0.3 (http://www.r-project.org/).

Results

From 8674 enrolled individual in HIV-BCS 8007 were considered eligible for this study. Total follow-up time was 32,397 years. The median duration of follow-up was 3.51 years (IQR 1.63–6.13 years; maximum 11.51 years). As shown in Table 1, 80,302 CD4+ T-cell and 79,997 VL examination records in HIV-BCS versus 94,083 CD4+ T-cell, and 84,810 VLs from SISCEL were observed. The general CD4+ T-cell HIV-BCS testing rate was 247 per 100 py versus 290 per 100 py and the VL HIV-BCS testing rate was 246 per 100 py versus 261 per 100 py. Sites with the lowest CD4+ T-cell HIV-BCS and VL testing rates, respectively, were Manaus – FMT (158 per 100 py and 142 per 100 py) and Belém UREDIP (186 per 100 py and 179 per 100 py). Rio de Janeiro IPEC showed the highest testing rates (186 per 100 py and 179 per 100 py).

Table 1.

Patients, total of follow-up time (years), number and testing rate of CD4+ T-cell counts and HIV viral load assessments for HIV-BCS and SISCEL databases

Sites of HIV-BCSa N Follow-up time Total CD4+ T-cell HIV-BCS CD4+ T-cell HIV-BCS rateb Total CD4+ T-cell SISCEL CD4+ T-cell SISCEL rateb Total viral load HIV-BCS Viral load HIV-BCS rateb Total viral load SISCEL Viral load SISCEL rateb
I 518 1520 2404 158 5869 385 2164 142 3154 207
II 618 2395 4463 186 9424 393 4297 179 6930 289
III 134 342 714 208 1411 412 671 196 789 230
IV 53 57 153 266 287 499 171 297 227 395
V 50 103 255 246 505 487 208 200 309 298
VI 410 1486 3694 248 4263 286 3735 251 4324 290
VII 290 1114 2379 213 4679 419.85 2278 204 3597 322
VIII 1.906 5985 19,274 322 14,251 238.08 19,902 332 12,925 215
IX 519 2320 5389 232 5367 231.24 5288 227 6404 275
X 949 4538 12,005 264 12,691 279.63 12,094 266 15,801 348
XI 847 4881 10,985 225 12,907 264.41 10,940 224 9612 196
XII 408 2177 4974 228 6538 300.23 5022 230 5945 273
XIII 1.305 5472 13,613 248 15,891 290.37 13,227 241 14,793 270
Total 8.007 32,397 80,302 247 94,083 290 79,997 246 84,810 261

aI - Manaus – FMT; II - Belém UREDIP; III - Santarém - Municipal STF; IV - Recife - HC/UFPE; V - J. Guararapes - MUNICIPAL STF; VI - Salvador – HUPES; VII - Salvador - CEDAP; VIII - Rio de Janeiro – IPEC; IX Belo Horizonte - UFMG; X - São Paulo - CRT/SP; XI - São Paulo - Municipal Network; XII - SAE S.J.R.P - MUNICIPAL STF; XIII - Porto Alegre – PARTENON

bTesting rate per 100 person years

Proportions of CD4+ T-cell counts and VL assessments between the two databases are showed in Table 2. In the SISCEL database, missing data were more frequent than in the HIV-BCS for CD4+ T-cell (26.9% [95%CI 25.9 to 27.8%]) versus (7.4% [95%CI 6.9 to 8.0%]) and VL (35.0% [95%CI 33.9 to 36.0%]) versus (12.2% [95%CI 11.5 to 12.9%]) and the proportions look different. However, when missing data were disconsidered, proportions result similar and overlapping intervals occur in all categories. Additional file 1: Tables S1 and S2 show results according to site.

Table 2.

Proportion and 95% confidence interval for categorized measures of lowest CD4+ T-cell count and highest HIV viral load before cART for HIV-BCS and SISCEL databases

HIV-BCS SISCEL
N % (95%CI) N % (95%CI)
Full data
 CD4+ T-cell count (cell/mm3)
  < 200 3878 48.4 (47.3–49.6) 2857 35.7 (34.7–36.7)
  200 |-- 350 2722 34.0 (32.9–35.0) 2194 27.4 (26.4–28.4)
  > 350 812 10.1 (9.5–10.8) 803 10.0 (9.3–10.8)
  NDA 595 7.4 (6.9–8.0) 2153 26.9 (25.9–27.8)
  Total 8007 8007
 HIV viral loada
  Below 188 2.3 (2.0–2.7) 115 1.4 (1.2–1.7)
  Above 6844 85.5 (84.8–86.3) 5093 63.6 (62.6–64.6)
  NDA 975 12.2 (11.5–12.9) 2799 35.0 (33.9–36.0)
  Total 8007 8007
Disconsidering NDA
 CD4+ T-cell count (cell/mm3)
  < 200 2923 50.8 (49.5–52.1) 2811 48,9 (47.6–50.1)
  200 |-- 350 2131 37.0 (35.8–38.2) 2166 37.6 (36.4–38.9)
  > 350 699 12.2 (11.3–13.0) 776 13.5 (12.6–14.4)
  Total 5753 5753
 HIV viral loada
  Below 114 2.2 (1.8–2.6) 115 2.2 (1.8–2.6)
  Above 5094 97.8 (97.4–98.2) 5093 97.8 (97.4–98.2)
  Total 5208 5208

95% CI was based on 1000 bootstrap samples

NDA No data available

aViral load was grouped in: above or below assay lower limit of detection. The lower limit of detection varied according to method over the years between 400 and 40 copies / ml

High proportion of missing data in the SISCEL database for the lowest CD4+ T-cell count (26.9%) and the highest HIV VL (35%) before cART, were concentrated in the first years of patients inclusion: (1) for the CD4+ T-cell count – 33.9% in 2003 and 82.0% accumulated between 2003 and 2008; and (2) for the highest HIV VL – 32,3% in 2003 and 81.3% accumulated between 2003 and 2008.

The distribution of CD4+ T-cell counts was similar between the two databases: CD4+ T-cell count 1 with mean 334 cells/mm3 (SD 204 cells/mm3) in HIV-BCS versus 356 cells/mm3 (SD 216 cells/mm3) in SISCEL database. The proportion of detectable in the first count shows + 1.6% for HIV-BCS (Table 3). In this table we also show increase based in mean ofCD4+ T-cell counts (+ 197 cells/mm3 in HIV-BCS versus + 157 cells/mm3 in SISCEL) and a decrease in proportion of detectable HIV viral loads (− 17.3% HIV-BCS versus − 12.9% in SISCEL) were shown in data from the two databases. Additional file 1: Table S3 shows the results for each site.

Table 3.

Descriptive statistics including mean, median, standard deviation (SD) and interquartile range (IQR) for the ten first quantitative measures for CD4+ T-cells and proportion of detectable viral load after cART for HIV-BCS and SISCEL databases

HIV-BCS SISCEL
CD4+ T-cell HIV viral loada CD4+ T-cell HIV viral loada
Mean (SD) Median (IQR) Above (%) Mean (SD) Median (IQR) Above (%)
Count 1 334 (204) 307 (189–435) 29.5 356 (216) 327 (198–469) 27.9
Count 2 366 (207) 341 (215–476) 19.5 382 (219) 355 (220–499) 21.5
Count 3 397 (221) 369 (238–515) 18.0 412 (235) 381 (242–538) 17.6
Count 4 425 (235) 395 (262–548) 16.9 433 (241) 403 (259–565) 17.4
Count 5 444 (239) 418 (278–574) 16.1 451 (248) 423 (272–595) 16.9
Count 6 461 (246) 427 (288–597) 15.8 465 (254) 433 (285–607) 16.7
Count 7 480 (252) 455 (303–621) 14.3 478 (260) 445 (293–630) 16.2
Count 8 495 (258) 465 (311–638) 14.1 488 (264) 460 (299–643) 16.1
Count 9 513 (261) 486 (332–657) 12.8 502 (270) 474 (308–659) 15.5
Count 10 531 (272) 503 (336–690) 12.2 515 (282) 482 (314–669) 15.0

aAbove lower limit of detection. The lower limit of detection varied according to method over the years between 400 and 40 copies / ml

The general correlation observed for the quantitative lowest CD4+ T-cell count before cART was 0.970 (p < 0.001); the sites with the lowest and highest correlation were Recife – HC/UFPE - 0.921 (p < 0.001) and J. Guararapes – MUNICIPAL STF - 0.997 (p < 0.001). For the log of the highest VL before cART, the overall correlation was 0.971 (p < 0.001), and Rio de Janeiro – IPEC with 0.950 (p < 0.001) and Manaus – FMT 0.997 (p < 0.001) accounted for the lowest and highest correlations. In all cases the mean differences between the two data sources were statistically zero (Table 4).

Table 4.

Pearson coefficient correlation (r) and Bland-Altman agreement analysis for quantitative measures of lowest CD4+ T-cell count and highest viral load before cART between HIV-BCS and SISCEL databases

Sites of HIV-BCSa CD4+ T-cell HIV viral load (log)
Correlation Agreement Correlation Agreement
r p-value Mean Diff CI95%c p-value# r p-value Mean Diff CI95%c p-value#
I 0.994 < 0.001 0.16 − 1.22 to 1.55 0.815 0.997 < 0.001 0.0110 −0.040 to 0.017 0.429
II 0.974 < 0.001 0.14 −0.67 to 0.95 0.733 0.990 < 0.001 −0.0026 −0.032 to 0.027 0.859
III 0.956 < 0.001 0.71 −2.96 to 4.39 0.701 0.979 < 0.001 0.0075 −0.048 to 0.063 0.790
IV 0.921 < 0.001 0.00 b b 0.994 < 0.001 − 0.0040 −0.039 to 0.031 0.822
V 0.997 < 0.001 0.00 b b 0.995 < 0.001 − 0.0100 −0.031 to 0.011 0.327
VI 0.990 < 0.001 1.38 −1.7 to 4.51 0.386 0.965 < 0.001 −0.0110 − 0.040 to 0.018 0.453
VII 0.995 < 0.001 −0.03 −0.11 to 0.38 0.318 0.989 < 0.001 −0.0026 −0.036 to 0.035 0.989
VIII 0.956 < 0.001 −0.02 −2.17 to 2.12 0.981 0.950 < 0.001 −0.0050 −0.020 to 0.009 0.466
IX 0.946 < 0.001 0.85 −2.02 to 3.70 0.562 0.973 < 0.001 −0.0170 − 0.051 to 0.016 0.312
X 0.978 < 0.001 0.84 −1.85 to 3.54 0.540 0.965 < 0.001 0.0060 −0.010 to 0.030 0.634
XI 0.942 < 0.001 −0.96 −3.51 to 1.59 0.460 0.975 < 0.001 0.0030 −0.014 to 0.021 0.686
XII 0.967 < 0.001 −0.50 −1.5 to 0.58 0.361 0.990 < 0.001 0.0040 −0.027 to 0.035 0.800
XIII 0.971 < 0.001 −0.36 −2.00 to 1.33 0.673 0.966 < 0.001 0.0110 −0.017 to 0.020 0.903
Total 0.970 < 0.001 0.073 −0.07 to 0.88 0.861 0.971 < 0.001 −0.0025 −0.010 to 0.005 0.509

# Null hypothesis means (DIFF) = 0

aI - Manaus – FMT; II - Belém UREDIP; III - Santarém - Municipal STF; IV - Recife - HC/UFPE; V - J. Guararapes - MUNICIPAL STF; VI - Salvador – HUPES; VII - Salvador - CEDAP; VIII - Rio de Janeiro – IPEC; IX Belo Horizonte - UFMG; X - São Paulo - CRT/SP; XI - São Paulo - Municipal Network; XII - SAE S.J.R.P - MUNICIPAL STF; XIII - Porto Alegre – PARTENON;

bStatistics were not calculated because the standard deviation is zero;

cIncluding upper and higher limits of the confidence interval CI95%

The overall agreement coefficients for categorized CD4+ T-cell counts - 0.932 (p < 0.001) and HIV VLs - 0.996 (p < 0.001) were high. The Municipal Network in São Paulo presented the lowest kappa coefficient for both biomarkers - 0.867 (p < 0.001) for CD4+ T-cell counts and 0.855 (p < 0.001) for HIV VLs. The highest kappa agreement (1.00) was observed in J. Guararapes – MUNICIPAL STF for both indicators (Table 5).

Table 5.

Kappa coefficient agreement for categorized measures of lowest CD4+ T-cell count before cART and highest viral load before cART between HIV-BCS and SISCEL databases

Sites of HIV-BCSa CD4+ T-cellb Viral loadc
Kappa p-value Kappa p-value
I 0.980 P < 0.001 1.000 P < 0.001
II 0.983 P < 0.001 1.000 P < 0.001
III 0.905 P < 0.001 1.000 P < 0.001
IV 0.889 P < 0.001 1.000 P < 0.001
V 1.000 P < 0.001 1.000 P < 0.001
VI 1.000 P < 0.001 1.000 P < 0.001
VII 0.985 P < 0.001 1.000 P < 0.001
VIII 0.885 P < 0.001 1.000 P < 0.001
IX 0.938 P < 0.001 1.000 P < 0.001
X 0.937 P < 0.001 1.000 P < 0.001
XI 0.867 P < 0.001 0.855 P < 0.001
XII 0.989 P < 0.001 1.000 P < 0.001
XIII 0.940 P < 0.001 1.000 P < 0.001
Total 0.932 P < 0.001 0.996 P < 0.001

aI - Manaus – FMT; II - Belém UREDIP; III - Santarém - Municipal STF; IV - Recife - HC/UFPE; V - J. Guararapes - MUNICIPAL STF; VI - Salvador – HUPES; VII - Salvador - CEDAP; VIII - Rio de Janeiro – IPEC; IX Belo Horizonte - UFMG; X - São Paulo - CRT/SP; XI - São Paulo - Municipal Network; XII - SAE S.J.R.P - MUNICIPAL STF; XIII - Porto Alegre – PARTENON;

bCategorized lowest CD4+ T-cell before cART was grouped in: < 200 cell/mm3, 200 |-- 350 cell/mm3, ≥ 350 cell/mm3;

cViral load was grouped in: above or below assay lower limit of detection. The lower limit of detection varied according to method over the years between 400 and 40 copies / ml

Discussion

CD4+ T-cell and VL counts are two biomarkers of responses to antiretroviral treatment and HIV disease progression that have been used to monitor HIV infection in clinical follow-up. The VL is the most important indicator of initial and sustained response to ART and should be measured in all patients infected with HIV on entry into treatment, at the onset of therapy and on a regular basis thereafter. The CD4 count is the most important laboratory indicator of immune status in HIV-infected patients. It is also the strongest predictor of HIV disease progression and subsequent survival rate, according to results of clinical trials and cohort studies [1518].

This study proposed a validation method based on the correlation and agreement coefficient for VL and CD4+ T-cell data from the HIV-BCS and SISCEL databases. Our interest was not to evaluate viral suppression or the immunerestoration provided by antiviral therapy, since this had been widely described in the literature [1921]. The main focus of this paper was to show that information from HIV/AIDS medical records has proven to be of high quality when compared with the same data obtained from other systems. The main difference between these two sources of information (HIV-BCS versus SISCEL databases) is based on an increased likelihood of interference of human intervention in the first database, especially in phase 1, that used paper questionnaires to collect information from the patient’s medical record (Fig. 1).

This paper also allowed us to study the distribution of CD4+ T-cell counts and HIV VL assessments according to research sites, and in this regard we obtained important findings. About the results presented in Table 1, sites such as Manaus – FMT and Belém – UREDIP, located in the north of Brazil had a low density of laboratory tests in the HIV-BCS database (less than 2 measurements per persons annually for CD4+ T-cell and VL), but when these sites were observed in the SISCEL database, they presented frequencies of testing that were comparable to that seen in the other sites (more than 3 per year). This can be explained by the incompleteness of medical records as far as laboratory results are concerned. In contrast, Rio de Janeiro – IPEC showed more CD4+ T-cell and VL records in the HIV/BSC database (more than 3 per year for CD4+ T-cell and VL) than in SISCEL, as this site is a major reference center for in HIV research and clinical care. Furthermore, their laboratory integrates the National Network of Laboratories for CD4+ T-cell, VL and HIV genotyping tests and as they have a large and busy outpatient clinic, we suggest an internal flow that provides CD4+ T-cell and VL results for clinicians could have led to the partial feeding of the SISCEL database.

Vieira and Garrett [14] suggested operational cutoff points for kappa coefficient: less than chance agreement (< 0); slight agreement (0.01 to 0.20); fair agreement (0.21 to 0.40); moderate agreement (0.41 to 0.60); substantial agreement (0.61 to 0.80); and almost perfect agreement (0.81to 1.00). Mukaka [12] also suggested the rule of thumb for interpreting the size of a correlation coefficient: negligible correlation (0.0 to 0.30); low correlation (0.30 to 0.50); moderate correlation (0.50 to 0.80); high correlation (0.70 to 0.90); and very high correlation (0.90 to 1.00). Based on these authors, we feel comfortable saying that this study showed there were high correlation and agreement for CD4+ T-cell counts and VL assessments, taking overall and per site data.

In previous reports authors [57] have compared data from HIV surveillance information systems with those obtained from medical records or independent databases. They found substantial or almost perfect agreement for age, race, and gender, but poorer agreement for mode of HIV acquisition, CD4 + cell counts, and the more complex categorization of AIDS case definition.

Moreover, studies have been conducted to validate self-reported health information versus registered information from medical records. For instance, Kalichman, Rompa and Cage [22] found good agreement in self-reporting of CD4 cell counts, but not for HIV VLs. In a marginalized population in particular, agreement between self-reports and medical records was poor for ambulatory visits, poor to fair for medication use, and poor for laboratory tests. However, the agreement for CD4 count was substantially better [23]. Using another strategy An et al. [24] showed an agreement between self-reported and medical records was good in HIV status and date of first positive HIV test, but poor in date of last negative HIV test.

Two implications can be realized from this study: (1) concerning the HIV/AIDS cohort study in Brazil, we believe no additional transcription of CD4+ T-cell and VL counts from patients’ medical records is necessary, due to a reliable quality in the SISCEL database; (2) and another concerning the health service, with high correlation and agreement of their data, the permanent evaluation of the therapeutic success of the patients can be accomplished, and it is not necessary to carry out specific studies for this purpose.

The main strength of our study relies on the linkage of strongly robust HVI/BCS databases with 8007 subjects and with a long follow-up time (32,397.12 years) with thousands of tests results accumulated over the year sand the SISCEL database that integrates the National HIV/AIDS register in Brazil. All T-cell counts (CD4+/CD8+) and HIV VL assessments performed in the National Network of Laboratories (NNT) are mandatorily registered in this database.

Despite these results, some limitations of our study should be pointed out. We realize that high proportion of missing data in the SISCEL database for thelowest CD4+ T-cell count and the highest HIV VL before cART, were concentrated in the first years of patients inclusion. The SISCEL system has been effectively in since 2002 and, although a study has already shown its good quality for epidemiological surveillance [20], so it is fair to speculate that it did not perform very well in the first years of use.

We encourage further studies with SISCEL system in order to verify the quality information improvement in the time and new studies including another national database like SICLOM – Logistic Control System Drugs, SINAN – Notifiable Diseases Information System (surveillance system) or SIM – Mortality Information System and HIV-BCS. Validation data of socio-demographic characteristics, loss to follow-up, mortality and cART schemes can be evaluated.

Conclusion

The current study confirms that CD4+ T-cell counts and HIV VL assessments from HIV-BCS have high correlation and agreement with data obtained from SISCEL, especially after exclusion of missing data. The HIV-BCS database has a lower proportion of missing data concerning CD4+ T-cell counts and HIV VL, as compared to SISCEL.

Additional file

Additional file 1: (136.1KB, docx)

Table S1. Proportion and 95% confidence interval for qualitative measures of lowest CD4+ T-cell count before cART per site. Table S2 Proportion and 95% confidence intervals for qualitative measures of viral load before cART per site. Table S3. Descriptive statistics including mean, median, standard deviation (SD) and interquartile range (IQR) for the ten first quantitative measures for CD4+ T-cell and viral load after cART. (DOCX 101 kb)

Acknowledgements

The authors thank the researcher Jackeline Oliveira Gomes for her herculean and dedicated work with the verification of records from the HIV-BCS database and National HIV/AIDS register database in order to create a unique patient code.

Funding

The study is funded by the Brazilian National Council for Scientific and Technological Development, the Brazilian National Ministry of Health, the Pan American Health Organization, the STD/AIDS Referral and Training Centre of the São Paulo State Department of Health and the Evandro Chagas Clinical Research Institute of the Oswaldo Cruz Foundation. Alex Jones Flores Cassenote was given a Ph.D. Student scholarship from the São Paulo Research Foundation – FAPESP (proc. 2013/18158–0). The funders had no role in study design, data collection, analysis, decision to publish, or preparation of the manuscript.

Availability of data and materials

The datasets generated and/or analysed during the current study are not publicly available owing a restriction forbidden to authors related to participant confidentiality imposed by the Ethics Committee of the medical School of the University of São Paulo. Data from this paper are available upon request to the Ethics Committee of the medical School of the University of São Paulo. Mailing address: 251 Dr. Arnaldo Avenue- Cerqueira César – 01246-000 – São Paulo – SP – Brazil. Phone: + 55 (11) 3893–4401: Dr. Maria Aparecida Azevedo Koike Folgueira.

Abbreviations

cART

Combination Antiretroviral Therapy

CD4+ T-cell

CD4+ T-cell lynphocytes

HIV

Human immunodeficiency virus

HIV-BCS

HIV-Brazil cohort study

IPEC

Instituto de Pesquisas Clínicas Evandro Chagas (Evandro Chagas Institute of Clinical Research)

IQR

Interquartile range

IRB

Institutional Review Boards

NNT

National Network of Laboratories

Py

Person-year

SICLOM

Sistema de Controle Logístico de Medicamentos (Medication Logistics Control System)

SIM

Sistema de Informação sobre Mortalidade (Mortality Information System)

SINAN

Sistema de Informação de Agravos de Notificação (Notifiable Diseases Information System)

SISCEL

Brazilian Laboratory Tests Control System

Authors’ contributions

AJFC, AG, MME, JMA and AACS conceived and designed the study; AJFC, AG and MME participated in the acquisition and data analysis; AJFC, JMA and AACS participated of the results interpretation; AJFC and AACS drafted the article. All authors read and approved the final manuscript.

Ethics approval and consent to participate

Ethics Statement: This study was approved by the Ethics in Research Committee of the Medical School of the University of São Paulo (Universidade de São Paulo) (Decision #. #229/13). As this is a retrospective study, the aforementioned IRB has withdrawn the need for the free and informed consent term. Consent and responsibility were at the discretion of the directors of each of the participating sites.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Alex Jones Flores Cassenote, Email: cassenote@usp.br.

Alexandre Grangeiro, Email: ale.grangeiro@gmail.com.

Maria Mercedes Escuder, Email: mmescuder@gmail.com.

Jair Minoro Abe, Email: jairabe@uol.com.br.

Aluísio Augusto Cotrim Segurado, Email: segurado@usp.br.

References

  • 1.Lau B, Gange SJ, Moore RD. Interval and clinical cohort studies: epidemiological issues. AIDS Res Hum Retrovir. 2007;23(6):769–776. doi: 10.1089/aid.2006.0171. [DOI] [PubMed] [Google Scholar]
  • 2.McCoy SI, Jones B, Leone PA, Napravnik S, Quinlivan EB, Eron JJ, et al. Variability of the date of HIV diagnosis: a comparison of self-report, medical record, and HIV/AIDS surveillance data. Ann Epidemiol. 2010;20(10):734–742. doi: 10.1016/j.annepidem.2010.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Grangeiro A, Escuder MM, Cassenote AJ, Souza RA, Kalichman AO, Veloso VG, et al. The HIV-Brazil cohort study: design, methods and participant characteristics. PLoS One. 2014;9(5):e95673. doi: 10.1371/journal.pone.0095673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Brazil. Ministry of Health. Department of STD, AIDS and Viral Hepatitis. Sistemas de Informação. Available from: http://www.aids.gov.br/pt-br/gestores/sistemas-de-informacao. Accessed 3 June 2018.
  • 5.Paz LC, Luiza VL, Dhalia CBC. Avaliação da qualidade dos dados sistema de controle de exames laboratoriais (SISCEL) como fonte de identificação de casos de aids em crianças. Cad Saúde Colet. 2010;18(1):33–43. [Google Scholar]
  • 6.Gallagher KM, Jara M, Demaria A, Jr, Seage GR, Heeren T. The reliability of passively collected AIDS surveillance data in Massachusetts. Ann Epidemiol. 2003;13:100–104. doi: 10.1016/S1047-2797(02)00265-X. [DOI] [PubMed] [Google Scholar]
  • 7.Klevens RM, Fleming PL, Li J, Gaines CG, Gallagher K, Schwarcz S, et al. The completeness, validity, and timeliness of AIDS surveillance data. Ann Epidemiol. 2001;11:443–449. doi: 10.1016/S1047-2797(01)00256-3. [DOI] [PubMed] [Google Scholar]
  • 8.Pati R, Robbins RS, Braunstein SL. Validation of retention in HIV care status using the New York city HIV surveillance registry and clinical care data from a large HIV care center. J Public Health Manag Pract. 2017;23(6):564–570. doi: 10.1097/PHH.0000000000000515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Efron B, Tibshirani RJ. An introduction to the bootstrap. New York: Chapman & Hall/CRC; 1993. p. 436. [Google Scholar]
  • 10.Wayne WD. Biostatistics: a foundation for analysis in the health sciences. 9. Houston: Wiley; 2009. p. 782. [Google Scholar]
  • 11.Pereira JCR. Bioestatística em outraspalavras. São Paulo: Edusp; 2010. p. 424. [Google Scholar]
  • 12.Mukaka MM. A guide to appropriate use of correlation coefficient in medical research. Malawi Med J. 2012;24(3):69–71. [PMC free article] [PubMed] [Google Scholar]
  • 13.Altman DG, Bland JM. Measurement in medicine: the analysis of method comparison studies. Statistician. 1983;32(3):307–317. doi: 10.2307/2987937. [DOI] [Google Scholar]
  • 14.Viera AJ, Garrett JM. Understanding interobserver agreement: the kappa statistic. Fam Med. 2005;37(5):360–363. [PubMed] [Google Scholar]
  • 15.Murray JS, Elashoff MR, Iacono-Connors LC, Cvetkovich TA, Struble KA. The use of plasma HIV RNA as a study endpoint in efficacy trials of antiretroviral drugs. AIDS. 1999;13(7):797–804. doi: 10.1097/00002030-199905070-00008. [DOI] [PubMed] [Google Scholar]
  • 16.HIV Surrogate Marker Collaborative Group. Human immunodeficiency virus type 1 RNA level and CD4 count as prognostic markers and surrogate end points: a meta-analysis. AIDS Res Hum Retroviruses. 2000;16(12):1123-33. [DOI] [PubMed]
  • 17.Langford Simone E, Ananworanich Jintanat, Cooper David A. Predictors of disease progression in HIV infection: a review. AIDS Research and Therapy. 2007;4(1):11. doi: 10.1186/1742-6405-4-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Panel on Antiretroviral Guidelines for Adults and Adolescents. Guidelines for the use of antiretroviral agents in HIV-1-infected adults and adolescents. Department of Health and Human Services. Available at https://aidsinfo.nih.gov/contentfiles/lvguidelines/AdultandAdolescentGL.pdf. Accessed 29 Nov 2018.
  • 19.Autran B, Carcelain G, Li TS, Blanc C, Mathez D, Tubiana R, et al. Positive effects of combined antiretroviral therapy on CD4+ T cell homeostasis and function in advanced HIV disease. Science. 1997;277:112–116. doi: 10.1126/science.277.5322.112. [DOI] [PubMed] [Google Scholar]
  • 20.Palella FJ, Jr, et al. Declining morbidity and mortality among patients with advanced human immunodeficiency virus infection. HIV outpatient study investigators. N Engl J Med. 1998;338:853–860. doi: 10.1056/NEJM199803263381301. [DOI] [PubMed] [Google Scholar]
  • 21.French MA. Antiretroviral therapy. Immune restoration disease in HIV-infected patients on HAART. AIDS Read. 1999;9(8):548–549. [PubMed] [Google Scholar]
  • 22.Kalichman SC, Rompa D, Cage M. Reliability and validity of self-reported CD4 lymphocyte count and viral load test results in people living with HIV/AIDS. Int J STD AIDS. 2000;11:579–585. doi: 10.1258/0956462001916551. [DOI] [PubMed] [Google Scholar]
  • 23.Cunningham CO, Li X, Ramsey K, Sohler NL. A comparison of HIV health services utilization measures in a marginalized population: self-report versus medical records. Med Care. 2007;45(3):264–268. doi: 10.1097/01.mlr.0000250294.16240.2e. [DOI] [PubMed] [Google Scholar]
  • 24.An Qian, Chronister Karen, Song Ruiguang, Pearson Megan, Pan Yi, Yang Biru, Khuwaja Salma, Hernandez Angela, Hall H. Irene. Comparison of self-reported HIV testing data with medical records data in Houston, TX 2012–2013. Annals of Epidemiology. 2016;26(4):255–260. doi: 10.1016/j.annepidem.2016.02.013. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Additional file 1: (136.1KB, docx)

Table S1. Proportion and 95% confidence interval for qualitative measures of lowest CD4+ T-cell count before cART per site. Table S2 Proportion and 95% confidence intervals for qualitative measures of viral load before cART per site. Table S3. Descriptive statistics including mean, median, standard deviation (SD) and interquartile range (IQR) for the ten first quantitative measures for CD4+ T-cell and viral load after cART. (DOCX 101 kb)

Data Availability Statement

The datasets generated and/or analysed during the current study are not publicly available owing a restriction forbidden to authors related to participant confidentiality imposed by the Ethics Committee of the medical School of the University of São Paulo. Data from this paper are available upon request to the Ethics Committee of the medical School of the University of São Paulo. Mailing address: 251 Dr. Arnaldo Avenue- Cerqueira César – 01246-000 – São Paulo – SP – Brazil. Phone: + 55 (11) 3893–4401: Dr. Maria Aparecida Azevedo Koike Folgueira.


Articles from BMC Infectious Diseases are provided here courtesy of BMC

RESOURCES