Skip to main content
BMC Medical Research Methodology logoLink to BMC Medical Research Methodology
. 2025 Aug 20;25:195. doi: 10.1186/s12874-025-02643-w

Visualizing leadership classifications in rectangular data using a basket model and co-word network analysis: a case study of U.S. HCAHPS survey results

Tsair-Wei Chien 1, Willy Chou 2,3,4,
PMCID: PMC12366396  PMID: 40836275

Abstract

Background

Structured datasets, such as time-series or survey-based tables, often lack intuitive visualizations that reveal rankings, interrelationships, or leadership dominance among subjects. Traditional parametric statistics fail to capture such relational patterns, especially in ordinal or categorical data. This study proposes a novel nonparametric framework to visualize leadership styles through network diagrams.

Methods

We introduced a three-tier “basket model”—cells (small baskets), columns (medium baskets), and the full table (large basket)—to transform rectangular data into weighted co-word matrices. Using publicly available 2023 HCAHPS survey data from 52 U.S. states and territories, we applied a follower-leader clustering algorithm (FLCA) implemented in R. Leadership was classified into three types: absolute, relative, and no advantage. Network visualizations were generated using Sankey-style diagrams to highlight dominance and inter-cluster relationships.

Results

The weighted approach successfully identified Nebraska as the top leader in the upper 20 states and District of Columbia as a cluster leader among the bottom 20 after data inversion. The network diagrams effectively differentiated between absolute dominance (single strong cluster), relative dominance (sub-cluster formations), and no dominance (multiple independent clusters). Compared to traditional bar charts and choropleths, the method provided deeper insights into inter-state performance dynamics.

Conclusion

This study offers an innovative method for visualizing rankings and leadership patterns in rectangular datasets. By combining a multi-level basket model with co-word network analysis and open-source R scripts, users can quickly generate interpretable, cluster-based dominance diagrams. The approach is scalable, customizable, and applicable to a variety of fields, including healthcare, education, and public policy. Future work may extend this model to dynamic visual tools and broader interdisciplinary applications.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12874-025-02643-w.

Keywords: Leadership structure classification, Rectangular data visualization, Co-word network analysis, Basket model, HCAHPS survey

Highlights

  • Introduced a novel basket model to transform rectangular survey data into co-word networks for intuitive visualization of leadership and subject relationships.

  • Classified leadership styles into absolute, relative, and no advantage using a follower-leader clustering algorithm applied to weighted table-frequency data.

  • Demonstrated practical application through R scripts and Sankey diagrams, offering fast and scalable tools for visual analysis in healthcare and public policy.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12874-025-02643-w.

Background

Researchers often publish structured datasets to support replication and secondary analysis. While these datasets are rich in numerical content, they often lack intuitive visualizations that reveal ranking patterns, co-occurrence relationships, or leadership dominance among entities (typically represented by rows). This limitation is especially apparent in time-series or survey-based rectangular data, where rows denote subjects (e.g., countries, hospitals) and columns represent time points or questionnaire items.

Although ranking visualizations such as bar charts, bump charts, and alluvial diagrams are widely used to display comparative performance over time [13], they rarely convey co-occurrence or relational dominance between entities. Similarly, traditional correlation-based networks (e.g., those based on cosine similarity or Euclidean distance) have been applied in bibliometrics and survey analysis [4, 5], but these emphasize structural similarity rather than rank-based dominance or leadership. Thus, there remains a gap in methods that can transform rectangular data into interpretable network graphs that both rank and cluster entities based on their frequency, co-occurrence, and dominance.

Ranking based on frequency above the mean or median

Competitions among hospitals or institutions are comparable to sports ranking systems (e.g., basketball, baseball, badminton, Go, or Olympic medal rankings) [69]. For example, selecting the top five performers each year and organizing them into a matrix—where rows represent years and columns represent subjects—yields a rectangular dataset that can be transformed into a co-word-style network. Here, node size represents selection frequency, while edge width captures co-occurrence across time or contexts.

Weighted methods, such as those inspired by Olympic medal tallies [9], reflect repeated occurrences per cell and allow finer distinctions between entities. These methods reduce ties in ranking but are rarely translated into dynamic co-occurrence networks. Existing visualizations often neglect the temporal or item-based structure embedded in the matrix, making it difficult to observe leadership emergence or dominance dynamics.

While prior studies have advanced ranking visualizations and network clustering techniques, they often focus on static ranking views [10, 11] or structural similarity and modularity in co-word and bibliometric networks [1214]. These methods typically overlook rank-based leadership dominance across time or items and do not offer an integrated framework for transforming rectangular data into interpretable, dominance-classified networks. Our research uniquely addresses this gap by combining frequency-based ranking, co-occurrence analysis, and cluster-informed leadership structure classification into a coherent, automated visualization pipeline.

Converting time-series and survey data

Rectangular matrices—such as 12-month hospital performance data or survey responses—can be converted into co-occurrence structures by applying a threshold rule (e.g., top-N or above-median selection per column). Values in each cell (the “small basket”) are grouped per column (the “medium basket”) and aggregated across columns into a “large basket” to build a weighted co-occurrence network.

For instance, national datasets of top 10 cancers or top-rated hospitals can be analyzed using co-word logic. Similarly, survey data with ordinal or categorical values can be analyzed using either unweighted (binary presence) or weighted (rounded scores) modes, depending on the structure of each cell.

However, converting such datasets into network structures that reflect dominance, not just similarity or frequency, remains a methodological challenge. Our approach introduces a three-tier basket model to resolve this.

Visualizing co-word networks

Consider the scenario of a frontline clerk or policymaker attempting to visualize leadership from a survey-based dataset. While modern tools (e.g., spreadsheet visualizations, ChatGPT [15]) may assist in generating static heatmaps or bar charts, they do not offer mechanisms to render leadership dynamics or cluster-based dominance relationships. Commercial software also lacks templates for mind-map or network-style leadership diagrams that feature a clear, single-link structure between vertices, avoiding the clutter and complexity of traditional multi-edge network charts.

This study addresses this gap by developing R-based tools [16] that automate the transformation of structured rectangular data into co-word-style networks, providing intuitive and interactive visualizations of leadership structure classifications.

Research questions

This study seeks to answer the following questions:

  • RQ1: How can rectangular datasets (e.g., time-series or survey matrices) be effectively transformed into co-word networks using weighted or unweighted basket models?

  • RQ2: Can such network transformations reveal leadership dominance and subject associations more intuitively than traditional statistical or correlation-based methods?

  • RQ3: How can R-based visualization tools be developed to automate this transformation process and support practitioners in interpreting structured datasets?

Study objectives

The objective of this study is to develop and validate a visual analysis framework that transforms rectangular survey or time-series datasets into intuitive co-word network graphs, thereby enabling clearer identification of leadership dominance (ranked top 1 in vertices with others in a single cluster) and subject associations(edges in network).

Hypotheses:

  • H1: Rectangular datasets containing ordinal or frequency-based values can be effectively converted into interpretable co-word networks using a three-level basket model, enabling the identification of absolute, relative, and no-dominance leadership patterns among entities.

  • H2: Visualized co-word networks generated via the basket model will provide greater interpretability of leadership relationships than traditional statistical summaries (e.g., means, standard deviations).

Methods

Study data

We downloaded the 2023 summary analysis of HCAHPS survey results on inpatient perceptions of hospitalization experiences across 52 U.S. states and territories (including Washington, D.C., and the U.S. Virgin Islands) from the official HCAHPS website [17]. The publicly available spreadsheet includes 10 dimension scores ranging from 0 to 100, with higher scores indicating better performance (see Online Appendix 1).

There are ten question domains, including Comm. with Nurses, Comm. with Doctors, Responsiveness of Hospital Staff, Comm. About Medicines, Cleanliness of Hospital. Environment, Quietness of Hosp. Environment, Discharge Information, Care Transition, Hospital Rating, and Recommend the Hospital.

All data are publicly available from HCAHPS website [17] and are included in Online Appendix 1 without any participant identification information; thus, ethical approval was waived.

Three steps with a focus on research designs

This study comprises three key components:

Basket model: mathematical structure of the R code

  • A.

    Data model definition (Basket Model).

Assume the entire dataset is represented as a matrix Inline graphic, where:

  • n: number of row terms (e.g., countries, hospitals),

  • m: number of column units (e.g., months),

  • Inline graphic (=Inline graphic: the score of the i-th term in the j-th column (small basket),

  • j∈{1,…,m}: the column index (medium basket),

  • i∈{1,…,n}: the row index (terms),

  • M: the entire matrix (large basket) — representing all the data.

  • B.

    Mode selection and term conversion rule (weighted mode).

In the weighted mode, the frequency of a term xi in row i is calculated as:

fij = round(Mij).

where Mij may include decimal values and rounding ensures discrete frequency counts for co-occurrence analysis.

The multiset of terms for column j is:

graphic file with name d33e403.gif

(This is a multiset where each term can appear multiple times based on its rounded score).

  • C.

    Mathematical definition of nodes and edges

  • (i)

    Node Dataset — nodes

For the term set T, define the total frequency of each term xi across all columns as:

graphic file with name d33e618.gif

The total sum of weighted node frequencies is:

graphic file with name d33e625.gif
Note

In unweighted mode, fij∈{0,1} in weighted mode, fij is the rounded score.

Thus, the total number of observations (i.e., (=Inline graphic) approximately equals the total weighted node frequency.

  • (ii)

    Edge Dataset — data (Co-occurrence Frequencies)

At each time point (column) j, generate all unordered pairs of co-occurring terms:

graphic file with name d33e653.gif

The full set of all edges is:

graphic file with name d33e660.gif

The co-occurrence frequency of each pair (Inline graphic) is:

graphic file with name d33e673.gif

Where Inline graphicis an indicator function:

Inline graphic​.

So if Inline graphic co-occur in column j, add 1; otherwise, add 0. Summing across all columns gives the total co-occurrence count (Fig. 1).

Fig. 1.

Fig. 1

Basket model applied to this study

FLCA: algorithm for cluster-based leadership structure detection

We formalized the Follower-Leader Clustering Algorithm (FLCA) [18] into a five-step procedure, illustrated in Fig. 2, which includes inputs, initialization, stepwise logic for assigning clusters, and final output generation.

Fig. 2.

Fig. 2

Flowchart of the Follower-Leader Clustering Algorithm (FLCA) as implemented in R

Let:

  • G=(V, E) be an undirected graph, where:

    • V={v1, v2,…, vn} is the set of nodes (entities such as countries or institutions),
    • E={(vi, vj, wij)} is the set of weighted edges, where wij is the strength of association (e.g., co-occurrence frequency or similarity score).

Let C={C1, C2,…, Ck} be the resulting clusters.

Step 1

Leadership Identification.

  • Define the leader score L(v) of node v∈V as the sum of its weighted connections:

graphic file with name d33e757.gif

Where leader score L(v) is not based on network centrality, but is instead computed as the sum of original values (e.g., frequencies or scores) across all columns for each entity in the rectangular dataset.

  • Sort all nodes in descending order of L(v), and label the top-ranked node as the head leader (global leader), denoted L1.

Step 2

Follower assignment.

Each non-leader node v∈V is assigned to only one leader Li, defined as:

graphic file with name d33e778.gif

That is, each follower links to its strongest connected leader.

Constraint 1: Each follower can belong to only one cluster — that of its leader or the one toward the cluster leader.

Step 3

cluster validation rules.

Let cluster Ci be formed by:

graphic file with name d33e797.gif

Apply these rules:

Minimum followers for a leader (excluding head leader)

∣Ci​∣≥3 and ∣{v∣v→Li​}∣≥2.

That is, a leader (not the global head) must have at least 2 followers, and the cluster size ≥ 3.

  • 2.

    If the rule is not met, the leader is demoted to a follower and reassigned to another cluster.

Final output

Each valid cluster Ci satisfies [18]:

  • Contains a leader and at least 2 followers,

  • No follower belongs to more than one cluster,

  • Total clusters k depends on how many leaders pass the criteria.

The FLCA implemented in R script (Online Appendix 2-B03) is summarized in Fig. 2.

The algorithm begins with input data in rectangular form and a corresponding edge list with weighted co-occurrence scores (WCD). The top N nodes are selected and assigned temporary cluster IDs. Through an iterative process of leader-follower reassignment and cluster stabilization, the algorithm produces a final node list with cluster memberships and a network-level leadership classification (AA = Absolute Advantage, RA = Relative Advantage, NA = No Advantage).

Network to display dominance style

Leadership structure classification

These three leadership types (absolute, relative, and no dominance) provide a practical framework for interpreting how leading entities dominate, compete, or share influence within and across clusters in the network.

Figure 3 illustrates these three types of dominance with examples drawn from the dataset, highlighting distinct leadership patterns based on co-occurrence and clustering results, including (1) AA (Absolute Advantage): A single dominant cluster led by one top-ranked entity; (2) RA (Relative Advantage): Sub-clusters exist within the main structure, indicating partial dominance; (3) NA (No Advantage): Multiple disconnected or weakly connected clusters, indicating a distributed leadership pattern.

Fig. 3.

Fig. 3

Visual representation of three leadership dominance types

R scrips for network
  • (A)

    R Scripts:

Two datasets were created from rectangular data, including nodes with values and edges between nodes with weighted connection counts (e.g., (e.g., nodes and edges in Table 1), with the R script (Online Appendix 2- A02a). This analytical approach(with small and middle baskets for each column in Table 1) overcomes the traditional limitation of listing only the Top N terms (e.g., using bar charts to display frequency distributions of entities with the node data only). It can be regarded as a form of semantic network or repeated-observation analysis of terms with relation data, offering additional insights into the network relationships among terms and the dominant influence of leading entities.

Table 1.

Basket model with 3 baskets and 2 datasets applied to this study

Attribute Element Meaning
A. Baskets
Largea Inline graphic Score of term i in column j (e.g., frequency)
Smallb Inline graphic Rounded value of Inline graphic (weighted count)
Mediumc Inline graphic Multiset of terms in column j (based on Inline graphic)
Verticesd nodes(Inline graphic) Total frequency of term i across all columns
Relationse edges(Inline graphic, Inline graphic) Count of columns where Inline graphic and Inline graphic co-occur
B. Applications
Co-word analysis in bibliometrics
Disease-symptom or drug-gene interaction networks
Market basket analysis in retail (e.g., products bought together)
Temporal co-occurrence pattern mining (e.g., trend emergence)

aA row is a term (e.g., “Taiwan”), a column is a context (e.g., “March 2024”), and the matrix value is the frequency or weight (e.g., 5 publications)

bFor weighted analysis (e.g., term occurs 5 times), you simulate that by adding it 5 times in the basket

cThe multiset Inline graphic is created for each column j, where terms may repeat based on Inline graphic

dMode Selection in each column based on top n(row)/2

eContaining primary links for each follower related to their leaders ranked in Vertices only

Based on the principle of the follower-leader clustering algorithm(FLCA) [18], this script(Online Appendix 2- A02b) reads both the primary dataset and the relationship dataset(e.g., nodes and edges in Table 1) to classify members and generate the corresponding Sankey diagram code [19, 20].

  • (B)

    Sankey-type networks

Paste the generated Sankey diagram code into the website [14] to produce the visual network graph, as shown in Fig. 2; see the demonstration with the video (Online Appendix 2-A05a).

Features of the Sankey network graph

The Sankey diagram is especially effective at illustrating relationships and dominance patterns among entities (e.g., attributes or subjects). In our context, each node represents a subject (e.g., state or hospital) at a particular time point, with node size reflecting the total number of associations. Links between nodes indicate co-occurrence or transitions, with link width proportional to the frequency or weight of those connections.

Leadership dominance styles—classified as AA (absolute advantage), RA (relative advantage), or NA (no advantage)—can be interpreted visually by examining the number and structure of clusters in the network:

  • AA: A single, centralized cluster with one dominant leader.

  • RA: Sub-clusters nested under a dominant cluster, indicating partial leadership.

  • NA: Multiple disconnected clusters, showing no single dominant entity.

Compared to traditional bar charts or word clouds, Sankey diagrams offer not only magnitude comparisons but also rich visual context for inter-entity relationships and leadership structures. A complete step-by-step example of co-word occurrences using a small dataset is provided in Fig. 3 for clearer understanding. A step-by-step demonstration of how to conduct this study is provided in the MP4 video included in Online Appendix 2-A05a.

Important parameters in R script

In our R-based analysis, several key parameters were defined to control the data filtering and visualization process. To focus the Sankey diagram on the most influential nodes, only the top 20 vertices were displayed (ncount <- 20), determined based on the highest values when greaterascriterio <- TRUE. The node importance criterion was set using importance <- 2, where values specify the basis for node ranking—which influences the resulting cluster pattern: edge influence (0), total count (1), or a hybrid approach (2). In this study, the hybrid setting was applied, meaning nodes were clustered based on edge influence, with total counts displayed in parentheses. This approach highlights the close relationship between edge influence and total count—since edges are derived from shared values in smaller baskets (i.e., cells), a higher edge value generally corresponds to a greater total count.

The analysis also accounted for edge weight (weightcount <- TRUE), and from each dimension or column of the data, the top five elements were retained (topelement <- 5) for downstream network construction. The clustering algorithm allowed for the detection of a single cluster allowed when specified (FLCAcluster <- 1), relaxing the usual constraint that at least two clusters must exist.

Additionally, the network structure accounted for indirect connections (shiefttarget <- 1), and the normalization of values to a 0–100 scale was disabled (valuefrom0to100 <- FALSE) to preserve the original measurement scale while values in data are not too high(> 500).

R software version

All data processing and statistical analyses in this study were conducted using R software version 4.1.3 [16], with RStudio [21] serving as the integrated development environment to support data visualization.

Results

Choropleth maps without leadership dominance

With the R script(Online Appendix 2-A04a), Fig. 4 illustrates the traditional presentation of the 2023 U.S. HCAHPS survey results using choropleth maps.

Fig. 4.

Fig. 4

Traditional presentation of the 2023 U.S. HCAHPS survey results

Networks with leadership dominance for top 20 U.S. States

Figure 5 presents a heatmap of the top 20 U.S. states based on the 2023 summary analysis of HCAHPS survey results regarding inpatient perceptions of hospitalization. Using a weighted approach(Online Appendix 2-A02a), Nebraska ranks first, showing a relative advantage with its intra cluster. This highlights the utility of rankings, inter-state relationships, and leadership roles, which are further illustrated in the network visualization in Fig. 6.

Fig. 5.

Fig. 5

Top 20 US States in HCAHPS survey results 2023

Fig. 6.

Fig. 6

Top 20 US States in HCAHPS survey results 2023 led by Nebraska

Networks with leadership dominance for bottom 20 U.S. States

Figure 7 displays a heatmap of the bottom 20 U.S. states based on the 2023 summary analysis of HCAHPS survey results on inpatient perceptions of hospitalization.

Fig. 7.

Fig. 7

Bottom 20 US States in HCAHPS survey results 2023

District of Columbia ranks last, demonstrating a distinct position with a relative advantage within its intra cluster. This further underscores the value of incorporating rankings, inter-state relationships, and leadership roles, as illustrated in the network visualization in Fig. 8. It is worth noting that all scores were transformed using the formula (= max(score) − score) to identify the bottom 20 states prior to executing the R script (Online Appendix 2-A02a), with the parameter greaterascriterio set to FALSE.

Fig. 8.

Fig. 8

Bottom 20 US States in HCAHPS survey results 2023 led by District of Columbia

Discussions

Key findings and answers to research questions

This study aimed to address three research questions regarding the transformation of rectangular datasets into interpretable visual leadership networks. Our key findings are summarized below in relation to each question:

  • RQ1: How can rectangular datasets be transformed into co-word networks using basket models?

  • We applied a three-level basket model (cell, column, matrix) to convert rectangular data into co-word structures, enabling the construction of network diagrams that capture frequency and co-occurrence. Weighted and unweighted modes were both implemented using R(Online Appendix 2-A02).

  • RQ2: Can these transformations intuitively reveal leadership dominance and subject associations?

  • The resulting metro-style and Sankey-style graphs successfully visualized absolute, relative, and no-dominance leadership categories, confirmed through case examples.

  • RQ3: Can R-based tools automate this process to support practitioners?

  • Our open-source R script (Online Appendix 2-A02, 2-A04) enables users to paste rectangular data and generate classified co-word network diagrams within 5 s. This workflow bridges theory and practical needs for users in health, education, and policy.

The results of this study support the hypothesis that rectangular datasets containing ordinal or frequency-based values can be effectively converted into interpretable co-word networks using a three-level basket model, enabling the identification of absolute, relative, and no-dominance leadership patterns among entities.

Contributions and novel insights

We developed an integrated R script (Online Appendix 2-A04b) for co-word analysis of survey-base (or time-series) rectangular data, addressing the lack of visualized comparisons and rankings in many open research data.

Unlike traditional parametric approaches based on averages and standard deviations, our nonparametric, weighted method uses a configurable top-N threshold (e.g., n = 10 or half of entity number) to rapidly generate heatmaps and network graphs (see Online Appendix 1). Though the clustering and visualization processes require more technical handling (e.g., vertices and relation file creation, clustering (Online Appendix 2-A02a), Sankey diagram generation Appendix 2-A02b), our approach offers new visual insights.

The conversion of survey-based(or time-series) data into co-word rectangular matrices, combined with a leading followers clustering algorithm [18], successfully identifies dominant leaders (“champions”) and their clusters during the observation period. While absolute dominance coefficients [22, 23] quantify dominance strength, they do not capture intra-variable dynamics or distinguish among the three leadership categories. When the coefficient exceeds 0.7, visualizations of leadership type are advised and supplemented(e.g., the empirical data applied to next Sect. 4.3).

Key innovations of the weighted approach

  1. Clear leadership interpretation through a three-level basket model:

  • Small baskets (cells), medium baskets (columns), and large baskets.

  • Three leadership structure classifications: absolute, relative, and none.

  • Three cluster types: isolated, intra-cluster, and inter-cluster.

  • 2.

    Intuitive Sankey-style network diagrams, where frequency of appearance (left to right) mirrors cumulative scores but adds:

  • Independent column-wise rankings.

  • Co-occurrence relationships.

  • Leadership structure classification insights.

  • 3.

    Open-source R scripts(Appendix 2-A02) enable rapid visualization within 5 s of pasting input data, enhancing efficiency and ensuring transparency between theory and practice (see mathematical model in Methods).

  • 4.

    Demonstrated applicability in fields such as healthcare management, public health, and education evaluation. Potential extensions include disease surveillance, healthcare quality improvement, and regional resource allocation.

Alignment of innovations with research questions

These innovations directly address the study’s research objectives. The basket model and co-word transformation (Innovation 1) answer RQ1, by providing a systematic method to convert rectangular datasets into interpretable network structures. The classification of leadership structure into absolute, relative, and no dominance, along with the clustering and Sankey visualizations (Innovation 2), offer practical tools for intuitively revealing leadership patterns and relationships, thereby addressing RQ2. Finally, the development of open-source R scripts with rapid visualization capabilities (Innovation 3) ensures the method’s accessibility and usability in applied settings, fully supporting RQ3. Together, these contributions demonstrate that structured data can be transformed into actionable leadership insights using transparent, replicable, and automated processes.

Enhancing reproducibility through interactive demonstration

In addition, to enhance clarity and reproducibility of our procedure, we have incorporated a step-by-step application using a small illustrative dataset. We guide readers to interact with the full R-based workflow via the following steps:

  • Access the R script interface in Online Appendix 2-A04b, combined the 2 scripts);

  • Paste a small rectangular dataset (used for illustration) at the beginning of the script input area;

  • Press Enter to execute the R script online (note: FLCA must be enabled for the linkage to work properly);

  • Review the generated outputs, which include the Sankey code (compatible with SankeyMATIC), a Kano diagram, and summary tables for leadership classification.

This tool allows users to directly replicate the procedure and experiment (e.g., data on websites [69]) with different parameter settings. By default, the parameters importance <- 2 and topelement <- 5 are applied—clustering is based on edge influence, node size reflects total count, and the top five terms from each column are pooled for co-word analysis within each medium basket(column). Requiring only a few seconds to complete, the interactive demonstration(e.g., Sankey or Kano diagrams) is designed to clarify the procedure and enhance reader comprehension.

Limitations and recommendations

  1. The study used only open-access survey-based data, other types of data in a table are encouraged to replicate and reproduce research in the future.

  2. While this study emphasizes high-performing U.S. States, readers can modify parameters (Appendix 2-A02a) to explore lower-ranked relationships, as shown in Fig. 8.

  3. The cutoff value (e.g., n = 10 or half of row number) may require adjustment based on context. For datasets with many names, set ncount <- 20. For keyword-style (non-time-series) data, use addvalue <- 2 in Script 2 (Online Appendix 2-A02b), and specify the filename and directory (e.g., myfile <- “country.csv”, “F:/RR/“).

  4. Co-word analysis requires identifiable units and repeated observations. If only one column is available, the weighted method still applies.

  5. While Sankeymatic offers attractive outputs, the method currently lacks fully integrated R-based visualization.

  6. While this study focuses on rank-based leadership structure classification and applies the FLCA [18] to reveal structured groupings, it does not incorporate more complex community detection methods such as Louvain or Leiden algorithms. These modularity-based approaches could potentially uncover deeper latent structures and overlapping communities within larger or denser networks. Future research may integrate such clustering techniques to enhance pattern recognition and provide complementary insights alongside rank-based leadership identification.

Future work should explore dynamic, interactive visual tools for improved user experience, extend comparisons across subgroups (e.g., departments, regions, age groups), and apply the method to fields beyond health (e.g., education, sports, energy).

Conclusion

This study introduces the basket model and a co-word analysis method that transforms rectangular data into network graphs, uncovering inter-entity relationships and leadership structure classifications. It offers a fast, scalable, and practical tool for health and public policy decision-making, with broad potential for real-world adoption and cross-disciplinary use.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (11.7KB, xlsx)
Supplementary Material 2 (61.4KB, pdf)
Supplementary Material 3 (757.2KB, pdf)

Acknowledgements

We thank Enago (www.enago.tw) for the English language review of this manuscript.

Abbreviations

AA

Absolute advantage

CI

Cluster identifier

DC

District of Columbia

FLCA

Follower-leader clustering algorithm

HCAHPS

Hospital consumer assessment of healthcare providers and systems

NA

No advantage

RA

Relative advantage

R

R programming language

US

United States

Author contributions

TWC initiated the research, collected data, conducted the analysis, and wrote the manuscript. WC contributed to the design of the study and provided critical reviews of the manuscript, and WC and TWC contributed to the interpretation of the results.

Funding

There are no sources of funding to be declared.

Data availability

Data is provided within the manuscript or supplementary information files.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent to publish

Not applicable.

Competing interests

The authors declare no competing interests.

All data are publicly available in the website [11].

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (11.7KB, xlsx)
Supplementary Material 2 (61.4KB, pdf)
Supplementary Material 3 (757.2KB, pdf)

Data Availability Statement

Data is provided within the manuscript or supplementary information files.


Articles from BMC Medical Research Methodology are provided here courtesy of BMC

RESOURCES