Table 1.
Comparisons of model performance (accuracy or passing rate, %) and utility of different algorithms on two instruction-tuning scenarios with three LLMs (Mistral, TinyLlama, and Llama2)
| Setting | Metric | Local | FedAvg | FedProx | FedAMP | CFL | pFedGraph | iPFL |
|---|---|---|---|---|---|---|---|---|
| Mistral (Mix-Finance) | FIQA | 85.83 ± 4.55 | 82.19 ± 4.54 | 85.11 ± 6.61 | 87.65 ± 5.08 | 87.65 ± 3.02 | 87.65 ± 6.11 | 87.64 ± 1.99 |
| TFNS | 81.09 ± 4.36 | 83.33 ± 0.00 | 75.17 ± 5.66 | 78.75 ± 8.13 | 83.67 ± 2.83 | 83.83 ± 4.24 | 84.83 ± 1.41 | |
| NWGI | 54.92 ± 0.12 | 57.5 ± 0.95 | 58.59 ± 0.12 | 55.00 ± 3.07 | 54.75 ± 2.72 | 56.00 ± 4.00 | 58.67 ± 2.83 | |
| Avg-Utility | 0.0 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 46.1 ± 0.3 | 96.5 ± 0.0 | |
| TinyLlama (Mix-Fina.) | FIQA | 82.56 ± 4.02 | 77.09 ± 0.40 | 79.27 ± 1.65 | 83.64 ± 1.46 | 82.20 ± 5.56 | 82.56 ± 4.02 | 82.19 ± 2.47 |
| TFNS | 76.16 ± 7.07 | 76.25 ± 0.11 | 75.92 ± 0.35 | 77.92 ± 2.47 | 79.33 ± 2.59 | 80.67 ± 0.71 | 78.17 ± 2.12 | |
| NWGI | 48.92 ± 1.76 | 52.67 ± 1.18 | 53.25 ± 0.35 | 47.92 ± 3.66 | 48.42 ± 2.24 | 47.75 ± 0.82 | 50.42 ± 2.47 | |
| Avg-Utility | 0.0 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 96.5 ± 0.0 | |
| Llama2 (Mix-Fina.) | FIQA | 84.02 ± 6.09 | 78.19 ± 1.94 | 78.55 ± 0.40 | 84.01 ± 4.03 | 85.11 ± 6.61 | 83.65 ± 5.57 | 85.47 ± 6.10 |
| TFNS | 80.58 ± 0.83 | 81.25 ± 5.30 | 80.56 ± 6.28 | 76.63 ± 5.48 | 77.06 ± 6.98 | 76.75 ± 5.66 | 83.38 ± 2.83 | |
| NWGI | 43.17 ± 4.48 | 52.25 ± 4.77 | 52.44 ± 1.68 | 42.56 ± 6.98 | 45.94 ± 4.68 | 47.94 ± 3.09 | 56.25 ± 1.06 | |
| Avg-Utility | 0.0 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 45.9 ± 0.0 | 96.5 ± 0.0 | |
| Llama2 (Code&Fina.) | NWGI | 50.61 ± 2.63 | 49.94 ± 2.46 | 49.61 ± 2.67 | 51.58 ± 1.38 | 52.28 ± 4.97 | 50.00 ± 5.65 | 53.11 ± 0.51 |
| Code | 13.54 ± 2.30 | 15.00 ± 0.70 | 15.00 ± 1.26 | 14.02 ± 0.86 | 14.15 ± 0.80 | 14.27 ± 1.76 | 15.85 ± 1.14 | |
| Avg-Utility | 0.0 ± 0.0 | 58.2 ± 150.4 | 58.2 ± 150.4 | 58.2 ± 150.4 | 58.2 ± 150.4 | 137.7 ± 291.6 | 208.1 ± 131.1 |
Every value is presented as mean ± standard deviation. The first scenario is the Mix-Finance scenario, consisting of two clients for FIQA, two clients for TFNS, and two clients for NWGI. Each row shows the average performance of the two clients with the same dataset or utility of all clients. The second scenario, Code&Finance, includes three financial clients and five coding clients. Each row shows the average performance of clients with the same task or utility of all clients. Our iPFL consistently outperforms other baselines in clients’ utility and demonstrates the highest average model performance in most of the scenarios.