Skip to main content
. 2024 Oct 19;13(20):6246. doi: 10.3390/jcm13206246

Table 1.

Some examples of current AI multimodal platforms.

AI Platform Capabilities Input Modalities Typical Uses
GPT-4 Vision [18] (OpenAI) Text and image generation, comprehension, translation, summarization Text, image, video Content creation, conversation, coding assistance, data analysis, education, graphic design
DALL-E [19] (OpenAI) Image generation from textual descriptions Text Graphic design, art creation, visual content generation, advertising
CLIP [20] (OpenAI) Understanding and classifying images in the context of natural language Text, image Image search, analysis, classification based on textual descriptions
Whisper [21] (OpenAI) Speech-to-text transcription, translation Audio Transcription services, language translation of spoken content, accessibility tools
CoPilot [22] (GitHub) Code generation and suggestion based on natural language Text Software development assistance, debugging, code review, educational tools
Gemini [23] (Google) Text and image generation, comprehension, translation, summarization Text, image, video Conversational agents, customer service bots, personal assistants, interactive storytelling, education
DeepMind’s Perceiver [24] Processing and integrating different types of data Text, image, audio, video Universal data processing, cross-modal information retrieval, games, simulations, research
Midjourney [25] Image generation based on textual prompts Text Visual storytelling, concept art, design exploration
Stable Diffusion [26] Text-to-image generation, image editing Text, image Content creation, digital art, image editing, marketing
Meta.AI Llama [27] Text and image generation, comprehension, translation, summarization Text, image Content creation, conversational interfaces, data analysis, educational tools