. Author manuscript; available in PMC: 2021 May 26.

Published in final edited form as: Front Comput Sci. 2021 May 12;3:624683. doi: 10.3389/fcomp.2021.624683

Table 6.

The best classification cases of the audio-based, text-based, and multi-modal models. AD: Alzheimer’s Disease. Accuracy: mean and standard deviation of results of 5 rounds. Best: highest accuracy of all epochs in 5 rounds.

Input	Model (with pre-training)	Classes	Precision %	Recall %	Fl%	Accuracy %	Best %
Audio	YAMNet	non-AD	69.60 ± 6.80	59.20 ± 7.73	63.40 ± 5.57	66.20 ± 4.79	83.33
Audio	YAMNet	AD	64.40 ± 3.93	73.40 ± 8.82	68.60 ± 4.84	66.20 ± 4.79	83.33
Text	Longformer	non-AD	77.87 ± 3.75	90.00 ± 2.04	83.44 ± 2.33	82.08 ± 2.83	89.58
Text	Longformer	AD	88.14 ± 2.09	74.17 ± 5.53	80.44 ± 3.55	82.08 ± 2.83	89.58
Audio + Text	Dual BERT Concat / Joint (BERT large)	non-AD	83.62 ± 4.25	82.50 ± 5.53	82.80 ± 1.76	82.92 ± 1.56	87.50
Audio + Text	Dual BERT Concat / Joint (BERT large)	AD	83.04 ± 3.97	83.33 ± 5.89	82.92 ± 1.86	82.92 ± 1.56	87.50