. 2026 Feb 25;13:1765950. doi: 10.3389/frobt.2026.1765950

TABLE 2.

Summary of AI models and processes applied in the Zequinha project.

Category	Component/Tool	Description
Embedded AI models	Language processing (LLM)	Gemma 3 (under evaluation) for dialogue generation, context, and reasoning
	Voice synthesis (TTS)	VITS-based model, trained for Zequinha’s custom voice
	Lip sync	VITS component that generates mouth movement parameters from audio
	Speech recognition (STT)	Audio-to-text transcription model
	Spontaneous movement generation	Transformer-based model for contextual body animations
	Facial detection	A model that locates human faces in the camera feed
	Facial recognition (identity)	A model that generates embeddings to identify recurring users
	Facial landmark detection	A model that maps eyes and mouth for tracking
	Facial attribute estimation	A model that estimates age and gender, adapting Zequinha’s performance
	Intention detection	An NLP model that classifies the user’s speech intent
AI engineering	Model optimization (TensorRT)	A model compilation and quantization for optimized use on the Jetson
	Prompt engineering	Creation and management of prompts that shape the personality and responses of the LLM
Supporting AIs	Code assistants	Gemini 2.5 pro, Grok, Deepseek
	Media generation (design)	ChatGPT (DALL-E) for 2D images, Rodin for 3D models