Skip to main content
. 2026 Feb 25;13:1765950. doi: 10.3389/frobt.2026.1765950

TABLE 2.

Summary of AI models and processes applied in the Zequinha project.

Category Component/Tool Description
Embedded AI models Language processing (LLM) Gemma 3 (under evaluation) for dialogue generation, context, and reasoning
Voice synthesis (TTS) VITS-based model, trained for Zequinha’s custom voice
Lip sync VITS component that generates mouth movement parameters from audio
Speech recognition (STT) Audio-to-text transcription model
Spontaneous movement generation Transformer-based model for contextual body animations
Facial detection A model that locates human faces in the camera feed
Facial recognition (identity) A model that generates embeddings to identify recurring users
Facial landmark detection A model that maps eyes and mouth for tracking
Facial attribute estimation A model that estimates age and gender, adapting Zequinha’s performance
Intention detection An NLP model that classifies the user’s speech intent
AI engineering Model optimization (TensorRT) A model compilation and quantization for optimized use on the Jetson
Prompt engineering Creation and management of prompts that shape the personality and responses of the LLM
Supporting AIs Code assistants Gemini 2.5 pro, Grok, Deepseek
Media generation (design) ChatGPT (DALL-E) for 2D images, Rodin for 3D models