. 2024 Oct 19;13(20):6246. doi: 10.3390/jcm13206246

Table 1.

Some examples of current AI multimodal platforms.

AI Platform	Capabilities	Input Modalities	Typical Uses
GPT-4 Vision [18] (OpenAI)	Text and image generation, comprehension, translation, summarization	Text, image, video	Content creation, conversation, coding assistance, data analysis, education, graphic design
DALL-E [19] (OpenAI)	Image generation from textual descriptions	Text	Graphic design, art creation, visual content generation, advertising
CLIP [20] (OpenAI)	Understanding and classifying images in the context of natural language	Text, image	Image search, analysis, classification based on textual descriptions
Whisper [21] (OpenAI)	Speech-to-text transcription, translation	Audio	Transcription services, language translation of spoken content, accessibility tools
CoPilot [22] (GitHub)	Code generation and suggestion based on natural language	Text	Software development assistance, debugging, code review, educational tools
Gemini [23] (Google)	Text and image generation, comprehension, translation, summarization	Text, image, video	Conversational agents, customer service bots, personal assistants, interactive storytelling, education
DeepMind’s Perceiver [24]	Processing and integrating different types of data	Text, image, audio, video	Universal data processing, cross-modal information retrieval, games, simulations, research
Midjourney [25]	Image generation based on textual prompts	Text	Visual storytelling, concept art, design exploration
Stable Diffusion [26]	Text-to-image generation, image editing	Text, image	Content creation, digital art, image editing, marketing
Meta.AI Llama [27]	Text and image generation, comprehension, translation, summarization	Text, image	Content creation, conversational interfaces, data analysis, educational tools