Abstract
Background:
Studying team-based primary care using 100% national outpatient Medicare data is not feasible, due to limitations in the availability of this dataset to researchers.
Methods:
We assessed whether analyses using different sets of Medicare data can produce results similar to those from analyses using 100% data from an entire state, in identifying primary care teams through social network analysis (SNA). First, we used data from 100% Medicare beneficiaries, restricted to those within a primary care services area (PCSA), to identify primary care teams. Second, we used data from a 20% sample of Medicare beneficiaries and defined shared care by two providers using two different cutoffs for the minimum required number of shared patients, to identify primary care teams.
Results:
The team practices identified with SNA using the 20% sample and a cutoff of six patients shared between two primary care providers had good agreement with team practices identified using statewide data (F-measure: 90.9%). Use of 100% data within a small area geographic boundary, such as PCSAs, had an F-measure of 83.4%. The percent of practices identified from these datasets that coincided with practices identified from statewide data were 86% vs. 100%, respectively.
Conclusion:
Depending on specific study purposes, researchers could use either 100% data from Medicare beneficiaries in randomly selected PCSAs, or data from a 20% national sample of Medicare beneficiaries to study team-based primary care in the US.
Keywords: Medicare, social network analysis, primary care
Introduction
Team-based care has been a trend in healthcare delivery reform, and interest in studying the effectiveness of such care models is growing [1]. One challenge of conducting this kind of research is to identify the teams providing team-based primary care from administrative data sets. Our previous study showed the potential of using Medicare data to identify team-based primary care through social network analysis (SNA) [2]. This prior study used results from a survey of primary care practices and linked these practices to 100% Texas Medicare data. However, to study team-based primary care using 100% national outpatient Medicare data is not feasible, since researchers can only request samples including up to 20% of Medicare beneficiaries. A 100% data sample can only be requested if the data is for beneficiaries within a certain geographic region or for specific diseases [3]. Given SNA’s potential in identifying team care, as well as current limitations on data acquisition, we assessed whether analyses using different sets of Medicare data can produce results similar to those from analyses using 100% data from an entire state (Texas). First, we used data from 100% Medicare beneficiaries within each primary care services area (PCSA) [4]. We chose PCSA not only because this specific data can be requested, but also to test whether primary care teams will be identified differently in smaller geographic regions developed for measuring primary care resources, utilization, and outcomes [5]. Second, we used data from a 20% national sample of Medicare beneficiaries, and defined shared care by two providers using two different cutoffs for the minimum required number of patients shared between them. Findings from the study can provide a guideline for future studies of nationwide health care requiring identification of team-based care. These studies can also help assess how the adoption of team-based care models and the impact of team-based care models varies across the US. Further, how these differences are related to the degree of primary care physician shortage, and the difference in state regulations allowing nurse practitioners to practice and prescribe independently can be further analyzed [6,7].
Methods
Data Source
We used 100% Texas Medicare claims in 2015. The files included the Medicare beneficiary summary file, the Carrier files, and the Outpatient Standard Analytical Files. The population, racial composition, and income level data in the studied PCSAs were obtained from the 2015 American Community Survey.
Primary care team identification via social network analysis
Our earlier work demonstrated the possibility of using the techniques of SNA with Medicare data to identify primary care providers (PCPs) who worked in the same practice [2], and a similar analytical pipeline is employed in the current work, as follows. PCPs were identified using Medicare and Medicaid Services (CMS) provider specialty codes including general practitioners (01), family physicians (08), general internists (11), geriatricians (38), nurse practitioners (50), and physician assistants (97). For nurse practitioners and physician assistants, we further used taxonomy codes (Appendix Table 1) to identify those in primary care. In the first step of our SNA, we identified PCP dyads who shared patients: two PCPs were defined as sharing a patient if they both submitted Medicare claims for primary care visits [2] (Appendix Table 2) for that patient in 2015. We then generated a network graph to identify connections between providers. Using the Walktrap community finding algorithm, with four steps as the length of random walk and weighted by the number of patients shared between PCPs, we identified PCP communities [8–10]. In this algorithm, random walks are used to compute distances between providers; then, providers are assigned to groups with small intra and larger inter-community distances. This is accomplished by hierarchical clustering.
We identified primary care teams through SNA using three datasets. First, within 100% Texas Medicare beneficiaries, we selected PCPs who had at least one dyad, defined as sharing at least 30 patients with another PCP. This method was validated in our prior study, [2] and therefore the primary care teams identified by this method were considered to be the reference for comparison. Second, among 20% Texas Medicare beneficiaries, we selected PCPs who had a dyad, defined as sharing at least 6 or at least 11 patients with another PCP. CMS offers virtual access to its data through the CMS Virtual Research Data Center (VRDC). However, it does not have the R package to conduct SNA. CMS also has cell size suppression policy which prevents users from downloading data containing cells <11, which prevented us from getting the actual count for any dyads with fewer than 11 shared patients. This explains our rationale for choosing 11, in addition to 6, as a cutoff for the number of shared patients. Third, for each PCSA in Texas, we selected PCPs who had a dyad, defined as sharing at least 30 patients among all Medicare beneficiaries living within that PCSA. Further, for this particular analysis, only those PCSAs with derived communities with a modularity ≥ 0.4 were chosen for further analyses. Modularity is a measure of community clustering, with a high modularity indicating dense connections [11]. A modularity equal to or larger than 0.4 is a clear indication that the identified communities are well-defined modules within the network [12] (see footnote under Table 1). There were 405 PCSAs in Texas. Among them, 288 PCSAs had at least one PCP dyad sharing at least 30 patients, and 151 of the 288 PCSAs had a modularity ≥ 0.4.
Table 1.
Dataset | No. shared pts between providers | No. providers | Modularity* | No. communities* |
---|---|---|---|---|
2015, 100% | ≥ 30 | 6511 | 0.95 | 1511 |
2015, with PCSA boundary | ≥ 30 | 5294 | ≥ 0.4** | 1525 |
2015, 20% | ≥ 6 | 7230 | 0.94 | 1558 |
2015, 20% | ≥ 11 | 4311 | 0.96 | 1156 |
The modularity was estimated from SNA where the number of shared patients was used as the weight of the edge, i.e., the tie between two providers. Modularity is a measure of community clustering, with a high modularity indicating the density of connections within communities. A network is modular insofar as the modules within it are densely connected, while connections between modules are sparser. The network is “state” for the statewide data analysis, and the network is “PCSA” for primary care services area (PCSA) data analysis.
The SNA were performed within each PCSA. The 1525 identified communities were located in the 151 PCSAs with a modularity ≥ 0.4. A previous study suggested a modularity larger than 0.3–0.4 is a clear indication that the subgraphs of the corresponding partition are modules [12]. Therefore, we used the cutoff of 0.4 to identify the communities that are in the well-defined modules within the PCSA.
PCSA Characteristics
Census tracts are nested within PCSAs, and therefore some PCSA characteristics, such as population, number of non-Hispanic whites, and median household income were derived from census-tract level estimates. For household income, we created two measures using census-tract level medians of household income, by weighting or not weighting by the number of households in the census tract. We also included the median area deprivation index (ADI) in the census block [13] and calculated the proportion of Medicare beneficiaries enrolled in any Medicare Advantage Plan in 2015. The PCP availability in each PCSA was calculated as the number of PCPs who had any Medicare Part B bills in 2015 per thousand Medicare beneficiaries in that PCSA.
Statistical Analysis
SNA results from the three datasets were summarized, and the agreement among the identified communities (primary care practices) was examined. The 1511 communities identified from 100% 2015 Medicare data using PCP dyads sharing at least 30 patients were considered the reference for comparison [2]. Communities identified from the SNA analyses in 20% data or in the data from selected PCSAs were referred to as “testing” communities. For each testing community, we identified its matched reference community by examining the overlapping of PCPs within the two communities. The community matched to the reference community was the one which contained the largest number of PCPs who appeared in both testing and reference communities. Less than 2% of testing communities were matched to multiple reference communities because they had a tie on the number of majority PCPs. The measures for agreement included recall (among the PCPs in the reference community, the percent of those PCPs also in the testing community), one minus purity (compared to the number of PCPs in the reference community, the percent of additional PCPs in the testing community), and F-measure, which is the harmonic mean of the purity and recall values of each community [14] (see footnote under Table 2). The maximum value of the F-measure is one, which indicates a perfect agreement. All analyses were performed using SAS version 9.4 (SAS Inc), except for SNAs, which were performed using the igraph package in R [15].
Table 2.
Testing data | Number of communities# | Recall§ (%) | 1-purity§ (%) | F-measure§ (%) |
---|---|---|---|---|
Mean ± SD, Median (IQR) | ||||
2015, with primary care services area (PCSA) boundary | 1525 | 80.8 ± 33.0 | 1.4 ± 6.6 | 83.4 ± 29.8 |
100.0 (66.7–100.0) | 0.0 (0.0–0.0) | 100.0 (80.0–100.0) | ||
2015, 20%, ≥ 6 shared patients** | 1339 | 93.9 ± 16.4 | 9.3 ±16.4 | 90.9 ± 15.8 |
100.0 (100.0–100.0) | 0.0 (0.0–16.7) | 100.0 (85.7–100.0) | ||
2015, 20%, ≥ 11 shared patients | 1155 | 73.8 ± 29.9 | 1.1 ± 6.2 | 80.3 ± 25.3 |
80.0 (50.0–100.0) | 0.0 (0.0–0.0) | 88.9 (66.7–100.0) |
The communities shown here are the community that contained the largest number of providers appearing in both testing and reference community. When multiple reference communities matched to one testing community, we calculated the average agreement measures for this testing community.
Recall: For each predicted cluster C, the recall quantifies the percentage coverage by C of the validated partition, P, in which C has the most elements. In our study, we calculated among the primary care providers (PCPs) in the reference community (identified from 100% data) the percent of those PCPs also in the testing community.
1-purity: Purity quantifies degree of homogeneous predicted clusters, in the sense of whether a predicted cluster contains elements of only one validated partition. A purity of 1 indicates that a predicted cluster contains individuals from only one validated partition. In our study, we compared to the PCPs in the reference community (identified from 100% data) the percent of additional PCPs in the testing community who were not found in the reference community.
F-measure is the harmonic mean of the above two quantities.
It was not feasible to survey 1511 practices identified from 100% statewide data. Therefore, we selected the 8 practices which were surveyed in our previous study and had at least 3 PCPs, and then compared those identified from 100% data and 20% data. We found 6 out of 8 practices with a recall measure of > 90% and a purity measure of 100%.
Results
Table 1 shows that the number of communities identified from SNA in PCSAs or from SNA using the 20% sample with PCP dyads sharing at least 6 patients was close to the number of communities (or team practices) identified from SNA using 100% statewide data (1525 and 1558 vs. 1511). Table 2 presents the agreement between communities detected from different datasets. When we compared the communities identified from PCP dyads sharing at least 6 patients in the 20% sample to those identified in statewide data, the average recall was 93.9%. This result indicates that 93.9% of PCPs identified in a team practice using statewide data were found in the matching team practices identified from SNA, using the 20% sample. The average purity measure for this comparison was 90.7%, indicating that less than 10% of PCPs in the team practice identified by SNA using the 20% sample were not found in the practice from statewide data. The average F-measure was 90.9%, demonstrating a good overall agreement. When we compared the practices identified from SNA in PCSAs to those from SNA in statewide data, the recall measure and F-measure were lower, but the purity measure was higher (80.8%, 83.4%, and 98.6%, respectively). Overall, the communities identified from PCP dyads sharing at least 11 patients in the 20% sample had lower recall and F-measures than those identified from the other two SNA analyses.
Since the communities identified from PCSAs depended on the strength of modularity, we conducted additional analyses to compare the PCSAs with modularity ≥0.4 to those with modularity <0.4 (Table 3) to demonstrate the characteristics and limitation on use of PCSAs to identify primary care teams. In general, the PCSAs that met the modularity criterion had larger populations, higher household income, lower ADI, more enrollees covered by Medicare advantage plans, a lower proportion of aging population, and more patients shared between PCPs.
Table 3.
PCSA characteristic | Modularity* ≥ 0.4 (N=151) | Modularity* < 0.4 (N=137) | p value | ||
---|---|---|---|---|---|
Mean ± SD | Median (IQR) | Mean ± SD | Median (IQR) | ||
Population 65 or older, % | 12.5 ± 4.5 | 11.8 (9.3–14.7) | 15.9 ± 5.1 | 15.3 (12.1–19.7) | <.0001 |
Non-Hispanic White, % | 36.2 ± 16.0 | 38.8 (26.9–47.1) | 40.5 ± 16.9 | 44.1 (30.8–54.2) | 0.0262 |
Medicare Advantage member, % | 32.8 ± 11.1 | 31.1 (24.2–39.5) | 25.3 ± 10.1 | 23.1 (17.9–30.7) | <.0001 |
Area Deprivation Index (ADI) (percentile)¥ | 60.1 ± 19.2 | 64 (45–74.5) | 69.7 ± 16.4 | 73 (58.5–81) | <.0001 |
N (%) | |||||
Population | <.0001 | ||||
Q1: <14003 | 10 (6.6) | 62 (45.3) | |||
Q2: 14003–38338 | 26 (17.2) | 46 (33.6) | |||
Q3: 38339–115728 | 48 (31.8) | 24 (17.5) | |||
Q4: >115728 | 67 (44.4) | 5 (3.6) | |||
Median household income (dollar)# | 0.0058 | ||||
Q1: <40358 | 32 (21.2) | 40 (29.2) | |||
Q2: 40358–45838 | 30 (19.9) | 42 (30.7) | |||
Q3: 45839–57430 | 40 (26.5) | 32 (23.4) | |||
Q4: >57430 | 49 (32.5) | 23 (16.8) | |||
Median household income (dollar), weighted‡ | 0.0030 | ||||
Q1: <40893 | 33 (21.9) | 39 (28.5) | |||
Q2: 40893–47815 | 29 (19.2) | 43 (31.4) | |||
Q3: 47816–57399 | 39 (25.8) | 33 (24.1) | |||
Q4: >57399 | 50 (33.1) | 22 (16.1) | |||
Primary care provider availability§ | 0.0261 | ||||
Q1: <14.1 | 28 (18.5) | 44 (32.1) | |||
Q2: 14.1–25.2 | 46 (30.5) | 26 (19.0) | |||
Q3: 25.3–37.9 | 40 (26.5) | 33 (24.1) | |||
Q4: >37.9 | 37 (24.5) | 34 (24.8) | |||
Number of provider dyads sharing ≥ 30 patients | <.0001 | ||||
Q1: <2 | 13 (8.6) | 61 (44.5) | |||
Q2: 3–9 | 24 (15.9) | 46 (33.6) | |||
Q3: 10–25 | 50 (33.1) | 22 (16.1) | |||
Q4: >25 | 64 (42.4) | 8 (5.8) |
PCSAs consisted of Census blocks. This variable is the median of the census-block level of the national ADI percentile.
PCSAs consisted of Census tracts. This variable is the median of the census-tract level of household income.
This variable is the median of the census-tract level household income, weighted by the number of households in the census tract.
Calculated as the number of primary care providers who had any Medicare Part B bills in 2015 per thousand Medicare beneficiaries in the PCSA.
The modularity was estimated from SNA where the number of shared patients was used as the weight of the edge, i.e., the tie between two providers. Modularity is a measure of community clustering, with a high modularity indicating the density of connections within communities. A network is modular insofar as the modules within it are densely connected, while connections between modules are sparser. A previous study suggested a modularity larger than 0.3–0.4 is a clear indication that the subgraphs of the corresponding partition are modules [12]. Therefore, we used the cutoff of 0.4 to identify the communities that are in the well-defined modules within the PCSA.
SD: standard deviation; IQR: interquartile range.
Discussion
This study attempted to build on prior research using Medicare data to identify team-based primary care by SNA [2]. Instead of using 100% statewide data to identify primary care practices, we examined whether subsets of statewide data can yield similar results. Overall, our results indicate that using a 20% sample data with a cutoff of 6 shared patients between two PCPs is favorable. The recall and F-measure were both above 90%, and the ratio of identified communities between this set of data and the 100% statewide data was 1.03. About 86% of communities identified from this set of data overlapped with communities from 100% statewide data. Our findings do not recommend using 20% sample data with a cutoff of 11 shared patients between two PCPs. Although this higher cutoff had higher purity, it was associated with lower recall measure (<75%). In addition, the ratio of identified communities between this subset of data and statewide data was 0.77.
The use of 100% data within a small geographic area, such as PCSAs, can be an alternative. The ratio of identified communities between this set of data and statewide data was 1.01, and all communities identified from these two sets of data overlapped. These results are expected, because PCSAs were designed to study primary care services [5]. However, the agreement between the communities identified from this data and statewide data without PCSA boundary was somewhat lower with the recall and F-measure being 80.8% and 83.4%, respectively. PCSAs were updated in 2013, and our analyses was based on Medicare data in 2015. The gap between sources of data might explain this lower agreement.
Although our previous study showed that communities identified from analyzing Medicare data by SNA provided good agreement in practice, the low sensitivity in identifying team-based primary care practices in the population should be recognized [2]. There is the concern of further reduced sensitivity in the use of 20% data to identify dyads between two providers. However, when using the lower cutoff of sharing at least 6 patients in 20% data versus the previously recommended cutoff of sharing at least 30 patients in 100% data, more providers met the selection criteria, and the ratio of identified communities between these two sets of analyses was slightly higher than 1.
Our current study showed the identified practices using this method, with the restriction that they tend to be located in areas with larger populations, higher income, and higher market penetration of advantage care plans. In addition, the methodology of SNA relies on provider dyads; therefore, the identified practices were more likely to be in the areas with more PCP interaction. These identified limitations should be recognized when using this methodology, especially in health services research with a focus on lower social economic communities.
Our study has other limitations. The analyses were done using Texas Medicare data. This data does not include outpatient claims from Medicare Advantage enrollees. In general, we do not expect the sampling error or the determination of PCSA boundaries to vary by states. Therefore, our recommendations for using 100% data at the PCSA level or a 20% national sample should not vary by state. However, we do anticipate differences in beneficiaries’ characteristics and provider interactions across states. This will have an impact on the generalizability in the type and amount of team care practices identified using Medicare data by SNA to other states. Another limitation was that practices located at state boundaries might not be correctly identified.
In conclusion, when researchers would like to study team-based primary care in the US, they could use either 100% data from Medicare beneficiaries in randomly selected PCSAs, or data from a 20% national sample of Medicare beneficiaries, depending on the study purposes. To obtain more precisely identified practices, the 20% national sample is better. To capture a higher proportion of practices, 100% data from PCSAs is better.
Supplementary Material
Acknowledgments
This work was supported by grants R01-HS020642 from the Agency for Healthcare Research and Quality and P30-AG024832 from the National Institutes of Health.
Footnotes
The authors have no financial, personal or potential conflicts of interest to disclose.
References
- 1.Mitchell P, Wynia M, Golden R, McNellis B, Okun S, Webb CE, Rohrbach V, Von Kohorn I. 2012. Core principles & values of effective team-based health care. Discussion Paper, Institute of Medicine, Washington, DC: www.iom.edu/tbc [Google Scholar]
- 2.Kuo YF, Raji MA, Lin YL, Ottenbacher ME, Jupiter D, Goodwin JS. Use of Medicare data to identify team-based primary care – Is it possible? Med Care. 2019;57(11):905–912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Mues KE, Liede A, Liu J, et al. Use of the Medicare database in epidemiology and health services research: a valuable source of real-world evidence on the older and disabled population in the US. Clin Epidemiol. 2017;2017(9):267–277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Dartmouth Atlas Data – Primary Care Service Area (PCSA). National Bureau of Economic Research. https://www.nber.org/data/dartmouth-atlas-primary-care-service-area-pcsa.html Accessed June 13, 2019
- 5.Goodman DC, Mick SS, Bott D, Stukel T, Chang CH, Marth N, Poage J, Carretta HJ. Primary care service areas: a new tool for the evaluation of primary care services. Health Serv Res. 2003; 38(1 pt 1):287–309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Xue Y, Goodwin JS, Adhikari D, Raji MA, Kuo YF. Trends in primary care provision to Medicare beneficiaries by physicians, nurse practitioners, or physician assistants: 2008–2014. J Prim Care Community Health. 2017. October;8(4):256–263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.State Practice Environment. 2019. American Association of Nurse Practitioner. URL https://www.aanp.org/advocacy/state/state-practice-environment Accessed June 13, 2019
- 8.Pons P, Latapy M. Computing communities in large networks using random walks In Computer and Information Sciences - ISCIS 2005. ISCIS 2005. Lecture Notes in Computer Science, vol 3733 Springer, Berlin, Heidelberg [Google Scholar]
- 9.Chejara P, Godfrey WW. Comparative analysis of community detection algorithms. 2017. Conference on Information and Communication Technology (CICT17) https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8340627 [Google Scholar]
- 10.Yang Z, Algesheimer R, Tessone CJ. A comparative analysis of community detection algorithms on artificial networks. Scientific Reports. 2016; 6:30750 | DOI: 10.1038/srep30750 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Newman MEJ. Modularity and community structure in networks. PNAS. 2006; 103(23):8577–8582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Fortunato S, Barthélemy M. Resolution limit in community detection. PNAS. 2007; 104 (1):36–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.University of Wisconsin School of Medicine Public Health. 2015. Area Deprivation Index v2.0 Downloaded from https://www.neighborhoodatlas.medicine.wisc.edu/ Aug 10, 2020.
- 14.Zaki M, & Meira W Jr (2014). Clustering Validation. In Data Mining and Analysis: Fundamental Concepts and Algorithms (pp. 425–464). Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511810114.018 [DOI] [Google Scholar]
- 15.R, Developmental, Core, Team. R: A Language and Environment for Statistical Computing 2015; Available at: http://www.R-project.org
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.