Huemann et al (2023a) |
Single dataset; limited multimodal datasets; primary annotator bias; language as input requirement; comparison challenges |
Diverse datasets; inter-observer variability; task expansion; reducing language dependency; AI-assisted clinical workflow |
Huang et al (2021) |
Focus on chest radiographs; dependence on report quality; designed for english-language reports |
Expand to other modalities and regions; improve robustness to report quality; adapt for different languages |
Li et al (2023d) |
2D segmentation limitation; manual text annotation requirement |
3D segmentation; automating text annotation generation |
Khare et al (2021) |
Small labeled datasets; model interpretability; incomplete consideration of expert diagnosis |
Larger datasets; enhanced interpretability; clinical integration |
Chen et al (2022) |
Complexity and diversity handling in medical image-text data |
Sophisticated techniques for diverse medical scenarios |
Delbrouck et al (2022) |
Visio-linguistic reasoning and understanding; overfitting and generalization; transparency in medical AI; NLG evaluation in medical AI |
Enhanced interpretability in multimodal scenarios; refined training methodologies; improved documentation and accessibility; advanced evaluation metrics |