Abstract
Data plays a crucial role in training contemporary AI models, but much of the available public data will be exhausted in a few years, directing the world’s attention toward the massive decentralized private data. However, the privacy-sensitive nature of raw data and lack of incentive mechanism prevent these valuable data from being fully exploited. Here we propose inclusive and incentivized personalized federated learning (iPFL), which incentivizes data holders with diverse purposes to collaboratively train personalized models without revealing raw data. iPFL constructs a model-sharing market by solving a graph-based training optimization and incorporates an incentive mechanism based on game theory principles. Theoretical analysis shows that iPFL adheres to two key incentive properties: individual rationality and Incentive compatibility. Empirical studies on eleven AI tasks (e.g., large language models’ instruction-following tasks) demonstrate that iPFL consistently achieves the highest economic utility, and better or comparable model performance compared to baseline methods.
Subject terms: Society, Ethics, Computer science
Private data holds vast potential for AI training but is hindered by privacy concerns and lack of incentives. This work proposes a game-theoretic model sharing market that enables secure, personalized training while incentivizing sharing behavior.
Introduction
Training on massive publicly-available data1–4, AI models have demonstrated significant proficiency in diverse domains5–8. As a well-known representative, ChatGPT6,9 has swept the world with its exceptional ability to solve general tasks. While it is commonly acknowledged that more data leads to better performance10, it has been estimated that available and valuable data in public will be exhausted by the year 202611,12, significantly impeding the continued enhancement of AI models under the current training paradigm.
The gradual depletion of public data starkly contrasts with the private sector, where massive institutions separately hold a wealth of valuable data. For instance, financial institutions such as Bloomberg13 possess high-quality private data to train AI models for finance. Ideally, if these institutions collaborate on their resources, they can create a substantial and diverse database capable of augmenting contemporary AI models13–16. Unfortunately, two critical practical issues prevent distributed private data from being fully exploited17,18. Firstly, the sensitivity of private data deters institutions from sharing it readily since this could raise privacy concerns and cause interest conflict17,19–23. Secondly, the absence of a comprehensive incentive mechanism results in a lack of motivation for institutions to actively and willingly engage in collaboration24,25.
Consequently, to enable the utilization of decentralized private data for the continued enhancement of contemporary AI models, it is imperative to establish a harmonious sharing market, which should safeguard privacy and ensure individual interests. In this market, data owners could act as buyers who selectively buy models from others to help train stronger models for their interested tasks, or as sellers who gain revenues from other institutions that have bought their models. Such a guarantee of privacy (i.e., trading models rather than data) and interests can well motivate institutions to participate in the market, forming a virtuous circle as more participants lead to better performance, which in turn attracts more participants.
Following this vision, we adopt personalized federated learning (PFL)26–28 as the technical foundation for model training in this market, due to PFL’s properties on preserving data privacy (i.e., sharing models) and catering to personal interests (i.e., improving personalization performance). In this PFL-based market, coordinated by a central server, participants share their locally trained models to achieve personalization through collaboration29–31. This approach has shown promising personalized performance through techniques like model regularization30, meta-learning28, and clustering32. However, existing methods mainly focus on personalization techniques, overlooking participants’ economic conditions and motivations, which are two key factors in market dynamics.
Therefore, in this paper, we introduce an inclusive PFL system that accommodates individual model preferences and economic conditions, where we specifically consider four types of participants as shown in Fig. 1a. We model the overall system as a graphical game, with participants as nodes and their exchange relationships as asymmetrically weighted edges, enabling a nuanced model-sharing network; see illustration in Fig. 1b. To achieve this, we propose a graph-based PFL optimization objective that captures an individual’s model preference via model similarity and economic conditions via reserving personalized utility functions. Specifically, we pursue personalized models by minimizing loss on interested tasks while maximizing the pair-wise model similarity among participants and the total social welfare within the overall collaboration graph. In this way, participants are allowed to select models based on their preferences and affordability, improving personalization performance, enhancing system robustness against inauthentic models, and promoting cost efficiency.
Fig. 1. Inclusive PFL market and our iPFL.
a. The clients have different purposes for entering a PFL system. A client can be: (i) a trader who simultaneously buys model and sells their model; (ii) a buyer who only buys a model and never shares its own model; (iii) a seller who only sells its own model and never buys models; (iv) an attacker who intends to ruin the system. b. In an inclusive market system, the model and money transaction should satisfy the needs of all the participants and block out attackers. c. In our iPFL, all the market behaviors are completed over a neutral server.
While the graph-based PFL provides the technical foundation, the market’s success also depends on an effective incentive mechanism to motivate participation. This mechanism must fairly reward contributions and ensure those benefiting from contributions compensate accordingly, while also promoting honest participation and deterring dishonest or malicious behavior33,34. To achieve this, we design a payment mechanism in our PFL system (we term our overall system iPFL, where i denotes incentivized and inclusive) that encourages willing and honest participation. This mechanism sets specific prices for model transactions, calculated using game theory principles and considering both the buyer’s economic utility and the seller’s model quality. This ensures mutual benefit from each transaction. Through theoretical analysis, we show that iPFL adheres to two key incentive principles: individual rationality, ensuring that all participants benefit from each training round, and incentive compatibility, incentivizing clients to disclose their true training costs, fostering a collaborative and honest market environment.
To verify the effectiveness of our proposed iPFL (see system overview in Fig. 1), we conduct extensive experiments, covering comprehensive comparisons with baselines, diverse scenarios, and tasks. Results show that iPFL consistently achieves higher economic utility and better or comparable personalization performance compared to state-of-the-art PFL methods. Remarkably, in a scenario of training large language models35, iPFL can achieve 49% higher economic utility and 9% higher model performance than the best baseline method.
Results
Performance evaluation
We use five image and text classification datasets commonly used in federated learning literature; and four instruction-tuning datasets for training instruction-following large language models. Classification datasets includes CIFAR-1036, Fashion-MNIST37, PACS38, FEMNIST39, and Shakespeare39; while instruction-tuning datasets includes three financial datasets (FIQA40, TFNS41, and NWGI42) and a coding dataset43. We compare our algorithm, iPFL, with other 7 baselines, including two general FL algorithms (FedAvg44 and FedProx45), and 5 classical PFL algorithms (Ditto30, FedAMP31, CFL32, FedFomo46, and pFedGraph29).
To evaluate the economic performance of iPFL, we introduce the utility function, as defined in Definition 1. It consists of three components: collaboration gain (Eq. (4)) with preference K, the sharing cost with individual unwillingness c and accumulated payment (Eq. (13)) in all rounds. To evaluate model performance, we utilize the classification accuracy metric in classification tasks. For evaluation in instruction-tuning tasks, we utilize the corresponding test dataset for financial clients to evaluate accuracy and Humaneval47 for coding clients to evaluate the passing rate. The baselines, specific settings for K and c, and implementation details are provided in the experimental section in Supplementary information.
For classification tasks, we consider 9 settings with 5 datasets. For CIFAR-10 and Fashion-MNIST, we design three types of data partition among clients: termed as NIID, Cluster, and Skew. (i) NIID is a common setting48–51, where local data among clients follows the Dirichlet distribution (default β = 0.1). (ii) The Cluster involves random client clustering, distinguishing between high (smaller β) and low heterogeneous levels within and between groups. (iii) For Skew, total classes are divided into clusters so that in each cluster, each client possesses 5 classes. FEMNIST and Shakespeare exhibit natural heterogeneity. In the case of PACS with four domains, each cluster represents one domain, namely the Cluster partition. The evaluation results of our iPFL against eight baselines across nine settings are shown in Fig. 2, emphasizing the comparisons on the trade-off between model performance and economic utility. Our iPFL achieves a comparable or even better performance with performance-oriented baselines. Meanwhile, iPFL consistently excels in economic utility, as evidenced by its highest plot scatter across diverse settings. Specifically, iPFL outperforms FedAMP by 1.75% in accuracy and 217.4% in utility, respectively. Overall, these results show that iPFL effectively strikes a balance between model performance and economic utility, demonstrating its capacity to harmonize model performance and economic benefits within the personalized federated learning framework.
Fig. 2. Comparison of average utility and accuracy in scatter under different settings.
NIID represents the β = 0.1, Cluster stands for 3 or 4 groups with β = 0.1 among clusters, consider two cases with β = 10 within each cluster; While in Skew, each client equally possesses data shards with 5 classes. Our iPFL achieves comparable or even better model performance and the highest utility across 9 settings.
For the instruction-tuning tasks, we consider two scenarios. (i) We configure the Mix-Finance scenario for financial sentiment analysis with six clients, where every two clients share one of the following datasets: FIQA40, TFNS41, or NWGI42. (ii) We consider a more complex Code&Finance scenario to represent a higher heterogeneity level, where five clients possess the code data from CodeAlpaca43 and three clients own the financial data from NWGI.
In Mix-Finance scenario, we test the results using three different LLMs as the local models: Mistral (7B)52, TinyLlama (1.1B)53 and Llama2 (7B)35; in Code&Finance scenario, we only use Llama2 as local models. The results presented in Table 1 demonstrate the superiority of iPFL: it excels in clients’ utility across all scenarios and demonstrates the highest model performance on most tasks when using the two 7B models (Mistral and Llama2). For example, iPFL demonstrates a remarkable 5.55% improvement in accuracy and a 58.3% gain in utility on the scenario with Llama2 compared to other baselines. This dual achievement highlights the effectiveness of our approach in enhancing model performance and economic utility.
Table 1.
Comparisons of model performance (accuracy or passing rate, %) and utility of different algorithms on two instruction-tuning scenarios with three LLMs (Mistral, TinyLlama, and Llama2)
| Setting | Metric | Local | FedAvg | FedProx | FedAMP | CFL | pFedGraph | iPFL |
|---|---|---|---|---|---|---|---|---|
| Mistral (Mix-Finance) | FIQA | 85.83 ± 4.55 | 82.19 ± 4.54 | 85.11 ± 6.61 | 87.65 ± 5.08 | 87.65 ± 3.02 | 87.65 ± 6.11 | 87.64 ± 1.99 |
| TFNS | 81.09 ± 4.36 | 83.33 ± 0.00 | 75.17 ± 5.66 | 78.75 ± 8.13 | 83.67 ± 2.83 | 83.83 ± 4.24 | 84.83 ± 1.41 | |
| NWGI | 54.92 ± 0.12 | 57.5 ± 0.95 | 58.59 ± 0.12 | 55.00 ± 3.07 | 54.75 ± 2.72 | 56.00 ± 4.00 | 58.67 ± 2.83 | |
| Avg-Utility | 0.0 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 46.1 ± 0.3 | 96.5 ± 0.0 | |
| TinyLlama (Mix-Fina.) | FIQA | 82.56 ± 4.02 | 77.09 ± 0.40 | 79.27 ± 1.65 | 83.64 ± 1.46 | 82.20 ± 5.56 | 82.56 ± 4.02 | 82.19 ± 2.47 |
| TFNS | 76.16 ± 7.07 | 76.25 ± 0.11 | 75.92 ± 0.35 | 77.92 ± 2.47 | 79.33 ± 2.59 | 80.67 ± 0.71 | 78.17 ± 2.12 | |
| NWGI | 48.92 ± 1.76 | 52.67 ± 1.18 | 53.25 ± 0.35 | 47.92 ± 3.66 | 48.42 ± 2.24 | 47.75 ± 0.82 | 50.42 ± 2.47 | |
| Avg-Utility | 0.0 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 96.5 ± 0.0 | |
| Llama2 (Mix-Fina.) | FIQA | 84.02 ± 6.09 | 78.19 ± 1.94 | 78.55 ± 0.40 | 84.01 ± 4.03 | 85.11 ± 6.61 | 83.65 ± 5.57 | 85.47 ± 6.10 |
| TFNS | 80.58 ± 0.83 | 81.25 ± 5.30 | 80.56 ± 6.28 | 76.63 ± 5.48 | 77.06 ± 6.98 | 76.75 ± 5.66 | 83.38 ± 2.83 | |
| NWGI | 43.17 ± 4.48 | 52.25 ± 4.77 | 52.44 ± 1.68 | 42.56 ± 6.98 | 45.94 ± 4.68 | 47.94 ± 3.09 | 56.25 ± 1.06 | |
| Avg-Utility | 0.0 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 96.5 ± 0.0 | |
| Llama2 (Code&Fina.) | NWGI | 50.61 ± 2.63 | 49.94 ± 2.46 | 49.61 ± 2.67 | 51.58 ± 1.38 | 52.28 ± 4.97 | 50.00 ± 5.65 | 53.11 ± 0.51 |
| Code | 13.54 ± 2.30 | 15.00 ± 0.70 | 15.00 ± 1.26 | 14.02 ± 0.86 | 14.15 ± 0.80 | 14.27 ± 1.76 | 15.85 ± 1.14 | |
| Avg-Utility | 0.0 ± 0.0 | 58.2 ± 150.4 | 58.2 ± 150.4 | 58.2 ± 150.4 | 58.2 ± 150.4 | 137.7 ± 291.6 | 208.1 ± 131.1 |
Every value is presented as mean ± standard deviation. The first scenario is the Mix-Finance scenario, consisting of two clients for FIQA, two clients for TFNS, and two clients for NWGI. Each row shows the average performance of the two clients with the same dataset or utility of all clients. The second scenario, Code&Finance, includes three financial clients and five coding clients. Each row shows the average performance of clients with the same task or utility of all clients. Our iPFL consistently outperforms other baselines in clients’ utility and demonstrates the highest average model performance in most of the scenarios.
Incentive properties
In this part, we empirically validate the incentive properties of iPFL, demonstrating its ability to encourage active and honest participation from clients. We begin by verifying individual rationality54,55, which means all the clients in the system should have positive utility, incentivizing their continued participation. Next, we assess incentive compatibility, which guarantees that the system motivates clients to truthfully disclose uploaded information and discourages dishonest behavior. Finally, we conclude by showcasing iPFL’s robustness against model attacks, further deterring the upload of fake models.
Validation of individual rationality
Here, we show the individual client utility distribution on CIFAR-10-Cluster, PACS, and Fashion-MNIST-NIID scenarios in Fig. 3. We compare iPFL with 7 representative baselines. Remarkably, in these scenarios, our proposed iPFL ensures that the utility of each client remains positive, outperforming all the other algorithms. These experiments convincingly verify that our proposed iPFL ensures the property of individual rationality since every participant’s utility is above zero, meaning that they can benefit from the system. Note that we accordingly provide the theoretical guarantee in Theorem 1.
Fig. 3. The utility distribution of the 8 clients with different algorithms under three settings.
Panels (a, b, c) display the utility distribution on CIFAR-10 (Cluster), PACS, and Fashion-MNIST (NIID), respectively. In each box plot, the edges of the box represent the first and third quartiles, the whiskers extend to 1.5 times the interquartile range from the box edges, and the circle indicates the mean utility across all clients. The gray scatter points represent individual client utility values, while the horizontal gray dotted line denotes zero utility. Our iPFL guarantees positive utility for each client and achieves the highest average utility.
Validation of incentive compatibility
Here, we explore the effects of clients being dishonest by considering a scenario where one client lies about the dataset size or training cost. In Table 2, we show the liar’s accuracy and utility over different lying ratios (compared with true value). The table shows that in our iPFL, the liar always achieves lower or the same accuracy, and significantly lower utility. These results verify that one cannot benefit from lying, which demonstrates the effectiveness of our iPFL in discouraging dishonest behaviors, contributing to promote the healthy development of the market. Note that we accordingly provide the theoretical interpretation in Theorem 2.
Table 2.
Liars’ performance(%) and utility comparison under different lying ratios (1 denotes honest) of her true data size or cost, under the NIID setting of CIFAR-10
| Cases | Honest | Lying on data size | Lying on cost | ||||
|---|---|---|---|---|---|---|---|
| Lying Ratio | 1 | 0.1 | 0.5 | 10 | 2 | 5 | 10 |
| Liar’s Accuracy | 75.430 | 66.933 | 75.331 | 75.430 | 75.430 | 75.430 | 75.430 |
| Liar’s Utility | 617.295 | -513.410 | -9.951 | 0.000 | 0.000 | 0.000 | 0.000 |
Lying on reported private information causes performance degradation and loss of earnings.
Validation of robustness against model attack
In this part, we investigate the robustness of FL algorithms against four distinct types of model poisoning attackers29. Attack strategies include (a) shuffling model updates, (b) flipping the numerical sign of model updates, (c) manipulating model updates with the same value at each element, and (d) manipulating model updates based on random Gaussian noises. Based on the CIFAR-10-Cluster scenario, we conduct one experiment for each attack type, where one attacker is introduced. We compare our iPFL with four representative state-of-the-art PFL algorithms: Ditto30, pFedGraph29, CFL32, and FedAMP31. In Fig. 4, we illustrate the changes in the average performance of benign clients and the utility of malicious clients after being exposed to attack. Notably, only our iPFL succeeds in reducing the utility of the malicious attacker while simultaneously maintaining accuracy for the benign clients. This unique characteristic positions our iPFL as a robust technical foundation for a healthy model-sharing market.
Fig. 4. The change of average benign clients’ performance (%) and malicious client utility after 4 different attack types.
The experiment involves 8 benign clients and an attacker under the Cluster setting on CIFAR-10. For each algorithm, we use the circle (•) to represent the average accuracy of benign clients and utility of the malicious attacker before the attack; We use the star (⋆) to represent the average accuracy of benign clients and utility of the malicious attacker after the attack.
Inclusive market
To verify that our iPFL is inclusive that can include clients with diverse preferences and economic conditions, we simulate a market that consists of clients with diverse roles. In the market, some traders buy and sell models, buyers who only buy models, sellers who only sell models, and attackers who try to sell poisoned models (we use a randomly parameterized model). These are achieved by setting the profiles of clients: we set the level of data eagerness as a random positive value for traders and buyers, while zero for sellers and attackers; we set the cost as a random positive number for traders, + ∞ for buyers, zero for sellers and attackers; see details in Supplementary material. Finally, we build a market based on the CIFAR-10-Cluster scenario with 12 clients and conduct model training for 20 rounds. We record the accumulated money transaction and the accuracy difference between model trained by iPFL and local training, and demonstrate them in Fig. 5. From the figure, we can clearly see that the transactions among the clients are well aligned with their roles (i.e., purposes). The traders buy models from others to obtain models with higher accuracy, and sell models to others to make a profit at the same time. The buyers pay others to buy models to improve their models while the sellers earn money by selling models. The attacker is successfully isolated by others, doing no harm to the market. Overall, the experiments verify that iPFL is an incentivized and inclusive PFL system since every unique individual gains benefits from joining the system.
Fig. 5. The transaction graph of market simulation.
The market consists of 3 traders, 4 sellers, 4 buyers, and 1 malicious attacker. Under our iPFL mechanism, three distinct groups are involved in transactions of goods or money, while the malicious attacker is isolated from these groups.
Discussion
In response to the challenges posed by the depletion of publicly available data and the need for collaboration among private institutions, we establish an inclusive sharing market that incentivizes the contributions of diverse participants with unique model preferences and economic conditions. Rooted in the personalized federated learning paradigm, iPFL integrates a graphical game within the framework based on the directed collaboration graph. iPFL introduces a multifaceted objective, aiming to minimize loss on relevant tasks, maximize pairwise model similarity, and enhance overall social welfare within the system. Our iPFL framework modifies local training methods to achieve improved personalization, flexibly adjusts collaboration regarding models and economic conditions, and implements a sophisticated payment mechanism. The proposed system iPFL, facilitates training, collaboration, and transactions to meet each participant’s demands and achieves incentive properties theoretically and experimentally. Regarding privacy preservation, our iPFL avoids direct data sharing, ensuring effective data isolation. Regarding the communication overhead, participants in iPFL only need to additionally report their model preference Ki, cost ci, and data amount Ni at the start of training. These one-time uploads are negligible for communication but significantly improve the ability to balance model performance and economic utility.
Comprehensive experiments reveal several significant findings about our iPFL. First, iPFL demonstrates exceptional versatility in balancing model performance and economic utility of the AI landscape. Extensive experiments, spanning various machine learning tasks and model scales in Fig. 2 and Table 1, highlight its capability to achieve comparable or superior model performance and consistently highest social welfare. Second, iPFL ensures individual rationality, as every institution involved in the system achieves non-negative benefits (see Fig. 3). This inherent motivation acts as a catalyst, encouraging a growing number of institutions to join the ecosystem. This, in turn, leads to an expansion of the market size, fostering a resilient and extensive database that can further catalyze advancements in AI research. Third, iPFL exhibits a remarkable capability to prevent dishonest practices. Exaggerating data size and cost by participants results in reduced utility (shown in Table 2), acting as a deterrent against dishonest behavior and market fraud. This feature underscores iPFL’s commitment to fostering an environment of honesty and integrity. Fourth, iPFL showcases a robust defense against potential attackers. Achieved by effectively isolating malicious participants in Fig. 4, iPFL contributes to a stable and trustworthy market environment. In addition, our inclusive simulation experiment in Fig. 5 further supports these findings. It demonstrates that honest institutions with distinct needs can acquire what they require, showcasing iPFL as an epitome of an actual healthy market.
Through these advancements, iPFL paves the way for a new era in collaborative AI. With iPFL, institutions can not only benefit from personalized models but also actively contribute to and gain from a flourishing, inclusive market while preserving privacy. However, there remains future work to be explored. One challenge is how to tackle the incentive problem under dynamic FL settings. Though iPFL allows the clients to exit before the end of training, we do not permit the addition of new clients during the training. The addition of new clients, who do not participate in previous training, may introduce both technical and economic challenges. Technically, it is difficult to evaluate the safety of a new client without former training records, so additional safety measures are necessary to determine if the client can be approved for participation in future training sessions. Economically, the addition of new clients is unfair to existing clients, as the new clients could access to current pre-trained models in the federation without participating in the initial training. In this case, there lacks incentive for the clients to join the training from the beginning. A possible solution is to add a proper admission fee to every new client to remedy its missing rounds. Future research could explore more flexible frameworks that adapt to the dynamic states by adjusting model-sharing strategies or pricing mechanisms.
Another potential direction to explore is how to regulate the server. In this work, we assume that the server is non-profitable at the first time for simplicity. Nonetheless, running the server is costly in some cases, so the clients may have to pay the server to ensure its engagement. One further improvement of iPFL is that each client pays back to the server in proportion to its own utility. In this way, the server’s income is proportional to the social welfare, so the server will also be motivated to maximize the welfare of the system. However, this scheme is still based on the integrity of the server, which means the server has no covert collusion with malicious clients. How to ensure the integrity of the coordinator of the whole FL system may be an important topic for future research, especially in PFL, where the server has no concern about the clients’ tasks.
Methods
In this section, we present the introduction of our problem formulation and our method, iPFL. We first provide the definition of PFL problem and the graph-based training loss in PFL. Additionally, we introduce the formulation of our graphical game within our PFL system and give out our overall system-level objective. Then, we introduce the details of our method, including training procedure and payment. Finally, we provide theoretical insights to discuss the incentive properties and robustness of our method iPFL.
Personalized federated learning problem
We consider the popular PFL settings, where m institutions join the system as clients and are managed by a central server. Each client i holds a private dataset with Ni data points sampled from client i’s local data distribution . Each client i maintains its own model parameters θi. Given a common loss criterion l( ⋅ , ⋅ ), the empirical loss of client i on its own dataset Zi is: . The clients hope to train a personalized model that performs well on its local data distribution . In this case, the population loss (testing loss) for client i is .
Due to privacy concerns and communication constraints in multi-institutional scenarios, the clients cannot directly send their data to other clients. In our work, we consider the model-sharing strategy in PFL. As a member of the federation, each client can refer to others’ model parameters, coordinated by the server and realized on the server side. Specifically, to describe the model sharing to topology among the clients, we use a directed graph represented by the adjacency matrix with:
| 1 |
We call the clients in the set {θj∣aij = 1} as the neighbors of client i. Then the training loss of the PFL system is defined as:
| 2 |
where Ni is the number of training samples of client i, serves as a hyper-parameter; d(θi, θj) is the distance function that quantifies the difference between two models θi and θj (e.g., ℓ2-distance, negative cosine similarity). The first term ∑i∈[m]Li(θi) is the sum of the empirical local loss of the clients. The second term represents the model reference among the clients: With pairwise collaboration indicated by binary indicator aij, if client i refers to j’s model (j is a neighbor of i in the graph), a regularization term will be added to the training loss. The regularization term encourages each client to collaborate with their neighbors by pursuing similar models, thereby improving the model performance under their neighbors’ data distribution. Thus, the training loss of the PFL encompasses both the loss of the client’s individual data distribution and the model difference from its neighboring models. This method helps in training models that suit specific local tasks while maintaining general applicability to other clients’ tasks.
Graphical game in PFL market and overall objective
To accommodate clients’ individual needs, we implement a market system that facilitates trading among them. With the previous graph-based PFL framework, we can use a graphical game model to formulate such a market of PFL. Each client in PFL is taken as a player in this game. We assume that the server has no interest in clients’ tasks and only acts as a neutral coordinator. In the PFL system, the clients can choose their neighbors for model reference, so the action of client i in this game is described by ai (the ith roll of A). After pinpointing the players and actions in this game, we then define the utility of the clients in Definition 1.
Definition 1
(utility) Consider that the clients are sharing models for multiple rounds (in each round t the clients share models according to At). The utility of client i in each round is defined as:
| 3 |
The utility function reflects the benefit of each client after each round of model sharing, with three components: the collaboration gain , the overall sharing cost and the overall payment . We elaborate on the three components as follows.
Definition 2
(collaboration gain function) The gain function of client i is , with . The function describes the relation between the data value and the data amount for client i.
The collaboration gain represents the value of data resource that client i has gained from its neighbors. gi is a function that reflects the relation between data value and sample data amount. By empirical studies10, gi is usually an increasing and concave function with gi(0) = 0. Specifically, in our experiments, the collaboration gain is:
| 4 |
where Ki is a hyper-parameter that reflects the scale of the gain. The gain scale Ki can be varied among the clients, reflecting their different data needs. If Ki = 0, that means the client cannot benefit from enlarged data access, so he may be a pure model seller and will not buy models from any other client. If Ki > 0, getting more data from the neighbors will increase the gain, but the marginal benefit brought by each neighbor will become smaller. If there is no neighbor, the gain is zero, as Gi(0) = gi(0) = 0.
Definition 3
(sharing cost) If is imported by one another client, client i will suffer a loss of ci.
The sharing cost ci represents the decrease of utility when client i’s model is referred by another client. So the overall cost in round t is . A higher value of ci indicates that client i is more sensitive to the spread of its models, possibly due to fairness or privacy considerations. By definition, unless the payback is larger than ci, the change of utility is negative, making client i reluctant to share its model. From this perspective, ci can also be taken as a minimum price to share the model. So we introduce money transactions to overcome this barrier to collaboration.
Definition 4
(overall payment) The overall payment is the amount of payment client i should pay to the system in round t.
The overall payment represents the total money expenditure of client i in round t. Since the server is neutral, we only consider the monetary transaction among the clients. Denote as the remittance from client i to client j. Since the money transaction is symmetric: , there is .
Our utility function captures both the data-level and economic-level gain of all the clients. A client will gain from using others’ models and will lose utility after disseminating its own models. For example, a client may share its own model with many other clients, but import few models from them. According to the first two terms in the utility, this client may not be satisfied with the arrangement of training, as it cannot obtain proportionate treatment from the federation and the utility is low (low collaboration gain but high sharing cost). To incentivize such clients, the system can increase their utility by assigning economic compensation (reflected by ).
Based on our evaluation of clients’ utility, we can define the social welfare of the whole market system, which is the sum of all the clients’ utility and reflects the overall satisfaction of the system.
Definition 5
(social welfare) Denote , the social welfare at round t is defined as:
| 5 |
In iPFL, our goal is to build up an inclusive PFL system that provides personalized training according to clients’ models and economic needs. To achieve this, we propose the following optimization problem:
| 6 |
The first two terms are model-similarity-aware training loss defined in Eq. (2). The third term SW(A) is the social welfare under the graph A, which can also be taken as a regularization term to avoid A degrade to 0. Therefore, the clients can attain personalized models without losing generality by minimizing both the loss on local tasks and the model difference compared to their neighbors. At the same time, our system also optimizes social welfare by refining the client’s selection of references, which ensures the benefits of each client and makes the training more economic-efficient.
Training procedure
We overview the system in Box 1, which takes T rounds in total to alternatively optimize personalized model θi and neighbor selection ai for each client and assign appropriate payment among clients. Specifically, in each round t, the clients first update and upload their local models . Then, the server calculates the data sharing graph At by optimizing the clients’ actions according to the game model. The amount of payment for each client is also calculated according to At of round t. At the end of each round, the clients have two choices: 1) pay and receive the aggregated model for next round; 2) quit the federation and take the best model in previous rounds as the final model. Therefore, the training procedure involves three key steps: local model training at the client side, where personalized models are trained locally; graph topology learning at the server side, where the model sharing topology is learned from the uploaded local models; payment calculation at the server side, where a bill for each client is determined by the server and clients complete money transaction by paying the bill.
Local model training
To train a personalized model by collective data, each client updates its model parameters locally by simultaneously minimizing loss on local tasks and model-level distance from the selected neighbors’ models. Due to the privacy and communication constraint, it is infeasible to directly update in Eq. (6) by gradient methods. So we relax the original problem in Eq. (6) into m sub-problems:
| 7 |
Note that the sum of the sub-objectives in Eq. (7) is the original Eq. (6), with each sub-objective representing client i’s local objective. If the server determines that client i has access to client j’s model, aij will be set as 1, so that client i will be encouraged to learn from client j during the local model training by minimizing the distance between local model θi and its neighbors .
However, directly solving Eq. (7) still requires times of communication cost because client i needs access to all neighbors’ local models. To efficiently avoid introducing additional communication costs, we propose to apply the proximal gradient descent method, in which the server computes Eq. (8) in advance before transmitting information to clients, and each client optimizes Eq. (9) during local model training in practical implementation:
| 8 |
| 9 |
where η is the step size in the calculation of the proximal center. With such a technique, the server only needs to send client i a proximal center at round t instead of multiple models from neighbors.
Graph topology learning
The server needs to find a suitable model-sharing graph based on the local models uploaded by clients. The graph is optimized by minimizing the model distance between neighbors and maximizing the social welfare of the overall system, which corresponds to solving a sub-problem of Eq. (6):
| 10 |
where ϕi denotes the objective function of the sub-problem for each client. The problem is an NP-hard integer programming and finding an optimal solution can be very costly. Therefore, we propose an efficient graph learning algorithm (see Box 2) to get an approximate solution for this optimization problem. In this algorithm, we calculate a threshold data amount for each potential neighbor of client i by . Since the marginal collaboration gain brought by each neighbor of client i decreases with client i’s total accessible data amount, if aij = 0 and , then setting aij = 1 would make ϕi smaller. Therefore, the algorithm in Box 2 keeps adding the client j with the largest to the neighbors of client i until . Hence, after the algorithm in Box 2 reaches the condition of termination, we have a solution that satisfies:
| 11 |
where ej is the jth row of the identity matrix. Though this algorithm cannot ensure a globally optimal solution to Eq. (10), its solution is a locally optimal choice for client i, as adding or removing any neighbor will not make the objective ϕi smaller. As the algorithm in Box 2 only need to calculate m − 1 thresholds and sort these thresholds, the time complexity is . By introducing such an approximate solution, our graph learning algorithm can attain a feasible At efficiently and robustly without sacrificing a large amount of time searching for unnecessary optimality, which is sufficiently effective in practice.
Payment calculation
According to the definition of the utility of clients in Eq. (3), if client i imports the model from j, it will pose a cost of cj to client j. This indicates that the model sharing is not reciprocal, leading to the dilemma that some clients lack the incentive to join the training. Therefore, after confirming the collaboration graph At among clients, the server needs to determine the required payment for client i, which needs to ensure that contributions from clients are aptly rewarded and those benefiting from these contributions are appropriately charged. In our payment design, we consider the reward calculated based on the benefit brought by the imported model. The payment is defined as follows: if client i imports j’s model, client i pays to j the marginal benefit minus the model difference:
| 12 |
where is the marginal benefit (change of the gain) brought by j’s model and is the model difference term defined in Eq. (6). In this way, can be written as:
| 13 |
This payment policy is beneficial for both client i and j: while client j gets paid more than the minimal price, client i does not lose all the benefits brought by client j’s model. Therefore, the model transaction in our system is reciprocal and no client is conveying benefits to others for free. Also, different from simply covering client j’s cost by setting , the clients cannot directly affect the payment by manipulating ci. This can significantly reduce the regret of pricing (i.e., losing money for not setting ci higher) and greedy behaviors (i.e., reporting higher ci for more profits).
Box 1 Overview of iPFL.
input: m clients, each with a local private dataset for training.
Server sends an initial model θ0 to every client.
for i = 1 to m do
Client i performs local training and obtains .
Client i reports Ni, ci, Gi( ⋅ ) back to Server.
end for
for t = 1 to T − 1 do
for i = 1 to m do
Client i reports back to Server.
end for
for i = 1 to mdo
Server calculates by Box 2. // Graph Topology Learning
end for
for i = 1 to mdo
Server calculates according to At. // Payment Calculation
if Client i pays to Server then
Server calculates the prox-center .
Server sends the prox-center model to Client i.
Client i updates . // Local Model Training
else
Client i quits and takes the best .
end if
end for
end for
output: θi, i ∈ [m] for each client.
Theoretical Insights
In this section, we provide some theoretical discussion to show the special properties of our system. First, we show that our system is individual rational, which means the clients joining the federation should get higher utility compared with training locally. Theorem 1 ensures that all the clients have non-negative utility (the proofs in this section are in Supplementary Materials), so the clients are incentivized to join and stay in the training.
Theorem 1
(Individual Rationality) If At is given by the algorithm in Box 2, then .
Second, we discuss the incentive compatibility of iPFL, which means the goal of the system is aligned with the clients’ benefits, so clients are incentivized to act honestly. Specifically, we can prove the incentive compatibility of the claim of ci. Lemma 1 shows that increasing ci will result in less sharing of client i’s model, so the training arrangement is compatible with the clients’ desire.
Lemma 1
(Incentive Compatibility of ci) Denote At the graph calculated by the server when everyone honestly reports their ci and is the new graph when client i reports . Then .
On the basis of Lemma 1 we can additionally prove Theorem 2, which ensures that the clients cannot obtain additional income by overstating ci.
Theorem 2
(Incentive Compatibility of ci) Denote Ui the one-round utility of client i when everyone honestly reports their ci and as its utility when client i reports . Then .
At the same time, if client i reports , it risks selling his model at a low price and the mechanism cannot ensure . Thus, the clients are encouraged to reveal their true cost ci to the server. This contributes to harmonious collaboration because clients do not need to be secretive about their unwillingness to share.
Besides incentive properties, we then discuss our system’s robustness against abnormally reported data amount Ni and model parameters . For benign and quality-aware clients, they have no reason to be dishonest about Ni and as lying about Ni and is harmful to their models: reporting wrong Ni will result in inaccurate model aggregation; uploading fake may result in less personalization. However, malicious attackers can upload noisy models to attack the system or exaggerate their data amount to defraud extra payment and increase their weight in others’ models. To address this issue, the graph learning procedure (see Box 2) considers both data amount and model similarity. Theorem 3 shows that malicious clients who upload abnormal Ni and θi are likely to be isolated without introducing extra efforts of model verification (e.g., testing models on a validation set).
Theorem 3
(Robustness against abnormal data amount) If At is given by the algorithm in Box 2 and Ni → + ∞, then .
As a result, the malicious clients whose priority is attacking the federation can only be trusted by other clients when they report a relatively smaller data volume. This means that their impact is limited: if client j reports a smaller Nj, it will receive a smaller reference weight (Nj/Ni) in the second term of Eq. (7) and other clients will not strongly emphasize its parameters in aggregation.
Besides, the consequence of uploading a fake model is similar. If client i is malicious and uploads a fake model (e.g., perturbing the real model parameters), is very likely to be different from other clients’ normal models that are trained on real datasets, indicating that the model difference term is large, which will increase the value of calculated by other clients and make client i less possible to be chosen. Therefore, the influence of such malicious clients is also limited.
Box 2 Graph Topology Learning.
input:.
initialization: ai = 0, n = 0
for j = 1 to m do
Calculate threshold by solving .
if no solution then
Set .
end if
end for
for j = 1 to m do
if then
aik = 1 // Add the remaining client with the largest threshold
n = n + Nk
else
break // Stop adding if total data amount reaches the threshold
end if
end for
output: ai.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Acknowledgements
This research is supported by the National Key R&D Program of China under Grant 2021ZD0112801, NSFC under Grant 62450162 and 62171276, and the Science and Technology Commission of Shanghai Municipal under Grant 21511100900 and 22DZ2229005.
Author contributions
Enpei Zhang: Conceptualization, Formal Analysis, Investigation, Methodology, Software, Validation, Writing—Original Draft; Jingyi Chai: Conceptualization, Data Curation, Investigation, Methodology, Software, Validation, Visualization, Writing— Original Draft; Rui Ye: Conceptualization, Data Curation, Investigation, Methodology, Validation, Writing—Original Draft; Yanfeng Wang (Corresponding Author): Conceptualization, Funding Acquisition, Resources, Supervision, Writing—Review and Editing; Siheng Chen (Corresponding Author): Conceptualization, Funding Acquisition, Resources, Supervision, Writing—Review and Editing.
Peer review
Peer review information
Nature Communications thanks Qinya Li, Xueqin Liang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Data availability
All datasets used in this paper are publicly available. CIFAR-1036 and Fashion-MNIST37 are widely used benchmarks in literature for image classification tasks, which contain 10 categories. PACS38 has 7 domains and contains 7 categories. FEMNIST for image classification and Shakespeare for the next character prediction are from the naturally heterogeneous synthetic dataset Leaf39. Three finance datasets include: FiQA40 comprised of 17k sentences sourced from microblog headlines and financial news, The Twitter Financial News Sentiment (TFNS)41 with 11,932 annotated documents of finance-related tweets, and the News With GPT Instruction (NWGI)42 featuring labels generated by ChatGPT9. The code dataset CodeAlpaca43 contains 20K instruction-following data. Specific data sources for the datasets used in this study are provided below: Fashion-MNIST: https://www.kaggle.com/datasets/zalando-research/fashionmnist, CIFAR-10: https://www.cs.toronto.edu/~kriz/cifar.html, PACS: https://dali-dl.github.io/project_iccv2017.html, FEMNIST and Shakespeare: https://leaf.cmu.edu/, FiQA: https://huggingface.co/datasets/pauri32/fiqa-2018, TFNS: https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment, NWGI: https://huggingface.co/datasets/oliverwang15/news_with_gpt_instructionsand CodeAlpaca: https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k. The preprocessed datasets used in our experiments are provided at 10.6084/m9.figshare.25139081.
Code availability
The complete code for our algorithm and other baselines is accessible at a public repository (https://github.com/19dx/iPFL), or it is also available from Zenodo56 (10.5281/zenodo.15088232).
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Enpei Zhang, Jingyi Chai, Rui Ye.
Contributor Information
Yanfeng Wang, Email: wangyanfeng@sjtu.edu.cn.
Siheng Chen, Email: sihengc@sjtu.edu.cn.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-025-62959-5.
References
- 1.Raffel, C. et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn Res.21, 5485–5551 (2020). [Google Scholar]
- 2.Gao, L. et al. The pile: an 800 GB dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027 (2020).
- 3.Schuhmann, C. et al. LAION-400M: Open dataset of CLIP-filtered 400 million image–text pairs. In NeurIPS Workshop on Data-Centric AI. (Jülich Supercomputing Center, 2021).
- 4.Schuhmann, C. et al. Laion-5b: an open large-scale dataset for training next generation image-text models. Adv. Neural Inf. Process Syst.35, 25278–25294 (2022). [Google Scholar]
- 5.Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process Syst.33, 1877–1901 (2020). [Google Scholar]
- 6.Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process Syst.35, 27730–27744 (2022). [Google Scholar]
- 7.Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF conference on computer vision and pattern recognition, 10684–10695 (IEEE, 2022).
- 8.Ramesh, A. et al. Zero-shot text-to-image generation. In Proc. International Conference on Machine Learning, 8821–8831 (PMLR, 2021).
- 9.OpenAI. GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
- 10.Kaplan, J. et al. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361 (2020).
- 11.Villalobos, P. et al. Will we run out of data? an analysis of the limits of scaling datasets in machine learning. arXiv preprint arXiv:2211.04325 (2022).
- 12.Muennighoff, N. et al. Scaling data-constrained language models. Adv. Neural Inf. Process. Syst.36, 50358–50376 (2023).
- 13.Wu, S. et al. Bloomberggpt: a large language model for finance. arXiv preprint arXiv:2303.17564 (2023).
- 14.Singhal, K. et al. Toward expert-level medical question answering with large language models. Nat. Med.31, 943–950 (2025). [DOI] [PMC free article] [PubMed]
- 15.Wang, Y. et al. How far can camels go? Exploring the state of instruction tuning on open resources. Adv. Neural Inf. Process. Syst.36, 74764–74786 (2023).
- 16.Chen, G. et al. PointGPT: Auto-regressively generative pre-training from point clouds. Adv. Neural Inf. Process. Syst.36, 29667–29679 (2023).
- 17.Voigt, P. & Von dem Bussche, A. The eu general data protection regulation (gdpr). Pract. Guide Cham Spring. Int. Publ.10, 10–5555 (2017). [Google Scholar]
- 18.Kairouz, P. et al. Advances and open problems in federated learning. Found. Trends Mach. Learn14, 1–210 (2021). [Google Scholar]
- 19.Price, W. N. & Cohen, I. G. Privacy in the age of medical big data. Nat. Med.25, 37–43 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Hathaliya, J. J. & Tanwar, S. An exhaustive survey on security and privacy issues in healthcare 4.0. Comput. Commun.153, 311–335 (2020). [Google Scholar]
- 21.Box, D. & Pottas, D. Improving information security behaviour in the healthcare context. Procedia Technol.9, 1093–1103 (2013). [Google Scholar]
- 22.Qi, T. et al. Differentially private knowledge transfer for federated learning. Nat. Commun.14, 3785 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kaissis, G. et al. End-to-end privacy-preserving deep learning on multi-institutional medical imaging. Nat. Mach. Intell.3, 473–484 (2021). [Google Scholar]
- 24.Yang, Q., Liu, Y., Chen, T. & Tong, Y. Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol.10, 1–19 (2019). [Google Scholar]
- 25.Karimireddy, S. P., Guo, W. & Jordan, M. I. Mechanisms that incentivize data sharing in federated learning. arXiv preprint arXiv:2207.04557 (2022).
- 26.Wu, C. et al. A federated graph neural network framework for privacy-preserving personalization. Nat. Commun.13, 3091 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.T Dinh, C., Tran, N. & Nguyen, J. Personalized federated learning with Moreau envelopes. Adv. Neural Inf. Process Syst.33, 21394–21405 (2020). [Google Scholar]
- 28.Fallah, A., Mokhtari, A. & Ozdaglar, A. Personalized federated learning with theoretical guarantees: a model-agnostic meta-learning approach. Adv. Neural Inf. Process Syst.33, 3557–3568 (2020). [Google Scholar]
- 29.Ye, R., Ni, Z., Wu, F., Chen, S. & Wang, Y. Personalized federated learning with inferred collaboration graphs. In International Conference on Machine Learning, 39801–39817 (PMLR, 2023).
- 30.Li, T., Hu, S., Beirami, A. & Smith, V. Ditto: Fair and robust federated learning through personalization. In International Conference on Machine Learning, 6357–6368 (PMLR, 2021).
- 31.Huang, Y. et al. Personalized cross-silo federated learning on non-iid data. In Proc. AAAI conference on artificial intelligence, vol. 35, 7865–7873 (AAAI, 2021).
- 32.Sattler, F., Müller, K.-R. & Samek, W. Clustered federated learning: Model-agnostic distributed multitask optimization under privacy constraints. IEEE Trans. Neural Netw. Learn Syst.32, 3710–3722 (2020). [DOI] [PubMed] [Google Scholar]
- 33.Li, X. et al. Boosting geoscience data sharing in China. Nat. Geosci.14, 541–542 (2021). [Google Scholar]
- 34.Zhan, Y., Li, P., Qu, Z., Zeng, D. & Guo, S. A learning-based incentive mechanism for federated learning. IEEE Internet Things J.7, 6360–6368 (2020). [Google Scholar]
- 35.Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
- 36.Krizhevsky, A., Hinton, G. et al. Learning multiple layers of features from tiny images (2009).
- 37.Xiao, H., Rasul, K. & Vollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).
- 38.Li, D., Yang, Y., Song, Y.-Z. & Hospedales, T. M. Deeper, broader, and artier domain generalization. In Proc. IEEE international conference on computer vision, 5542–5550 (IEEE, 2017).
- 39.Caldas, S. et al. Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097 (2018).
- 40.Maia, M. et al. Www’18 open challenge: Financial opinion mining and question answering. Companion Proceedings of the The Web Conference 2018https://api.semanticscholar.org/CorpusID:13866508 (2018).
- 41.Magic, N. Twitter financial news sentiment. https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment (2022).
- 42.Yang, H. Data-centric fingpt. open-source for open finance. https://github.com/AI4Finance-Foundation/FinGPT (2023).
- 43.Chaudhary, S. Code alpaca: An instruction-following llama model for code generation. https://github.com/sahil280114/codealpaca (2023).
- 44.McMahan, B., Moore, E., Ramage, D., Hampson, S. & y Arcas, B. A. Communication-efficient learning of deep networks from decentralized data. In Proc. Artificial intelligence and statistics, 1273–1282 (PMLR, 2017).
- 45.Li, T. et al. Federated optimization in heterogeneous networks. Proc. Mach. Learn Syst.2, 429–450 (2020). [Google Scholar]
- 46.Zhang, M. et al. Personalized federated learning with first order model optimization. arXiv preprint arXiv:2012.08565 (2021).
- 47.Chen, M. et al. Evaluating large language models trained on code 2107.03374 (2021).
- 48.Wang, H., Yurochkin, M., Sun, Y., Papailiopoulos, D. & Khazaeni, Y. Federated learning with matched averaging. In Proc. International Conference on Learning Representationshttps://openreview.net/forum?id=BkluqlSFDS (2020).
- 49.Yurochkin, M. et al. Bayesian nonparametric federated learning of neural networks. In Proc. International Conference on Machine Learning, 7252–7261 (PMLR, 2019).
- 50.Acar, D. A. E. et al. Federated learning based on dynamic regularization. In Proc. International Conference on Learning Representations (IEEE, 2020).
- 51.Ye, R., Du, Y., Ni, Z., Chen, S. & Wang, Y. Fake it till make it: federated learning with consensus-oriented generation. arXiv preprint arXiv:2312.05966 (2023).
- 52.Jiang, A. Q. et al. Mistral 7b. arXiv preprint arXiv:2310.06825 (2023).
- 53.Zhang, P., Zeng, G., Wang, T. & Lu, W. Tinyllama: an open-source small language model 2401.02385 (2024).
- 54.Kang, J., Xiong, Z., Niyato, D., Xie, S. & Zhang, J. Incentive mechanism for reliable federated learning: a joint optimization approach to combining reputation and contract theory. IEEE Internet Things J.6, 10700–10714 (2019). [Google Scholar]
- 55.Zeng, R., Zeng, C., Wang, X., Li, B. & Chu, X. A comprehensive survey of incentive mechanism for federated learning. arXiv preprint arXiv:2106.15406 (2021).
- 56.Zhang, E., Chai, J., Ye, R., Wang, Y. & Chen, S. Incentivizing inclusive contributions in model sharing market. 10.5281/zenodo.15088232 (2025). [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All datasets used in this paper are publicly available. CIFAR-1036 and Fashion-MNIST37 are widely used benchmarks in literature for image classification tasks, which contain 10 categories. PACS38 has 7 domains and contains 7 categories. FEMNIST for image classification and Shakespeare for the next character prediction are from the naturally heterogeneous synthetic dataset Leaf39. Three finance datasets include: FiQA40 comprised of 17k sentences sourced from microblog headlines and financial news, The Twitter Financial News Sentiment (TFNS)41 with 11,932 annotated documents of finance-related tweets, and the News With GPT Instruction (NWGI)42 featuring labels generated by ChatGPT9. The code dataset CodeAlpaca43 contains 20K instruction-following data. Specific data sources for the datasets used in this study are provided below: Fashion-MNIST: https://www.kaggle.com/datasets/zalando-research/fashionmnist, CIFAR-10: https://www.cs.toronto.edu/~kriz/cifar.html, PACS: https://dali-dl.github.io/project_iccv2017.html, FEMNIST and Shakespeare: https://leaf.cmu.edu/, FiQA: https://huggingface.co/datasets/pauri32/fiqa-2018, TFNS: https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment, NWGI: https://huggingface.co/datasets/oliverwang15/news_with_gpt_instructionsand CodeAlpaca: https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k. The preprocessed datasets used in our experiments are provided at 10.6084/m9.figshare.25139081.
The complete code for our algorithm and other baselines is accessible at a public repository (https://github.com/19dx/iPFL), or it is also available from Zenodo56 (10.5281/zenodo.15088232).





