TABLE 2.
Summary of AI models and processes applied in the Zequinha project.
| Category | Component/Tool | Description |
|---|---|---|
| Embedded AI models | Language processing (LLM) | Gemma 3 (under evaluation) for dialogue generation, context, and reasoning |
| Voice synthesis (TTS) | VITS-based model, trained for Zequinha’s custom voice | |
| Lip sync | VITS component that generates mouth movement parameters from audio | |
| Speech recognition (STT) | Audio-to-text transcription model | |
| Spontaneous movement generation | Transformer-based model for contextual body animations | |
| Facial detection | A model that locates human faces in the camera feed | |
| Facial recognition (identity) | A model that generates embeddings to identify recurring users | |
| Facial landmark detection | A model that maps eyes and mouth for tracking | |
| Facial attribute estimation | A model that estimates age and gender, adapting Zequinha’s performance | |
| Intention detection | An NLP model that classifies the user’s speech intent | |
| AI engineering | Model optimization (TensorRT) | A model compilation and quantization for optimized use on the Jetson |
| Prompt engineering | Creation and management of prompts that shape the personality and responses of the LLM | |
| Supporting AIs | Code assistants | Gemini 2.5 pro, Grok, Deepseek |
| Media generation (design) | ChatGPT (DALL-E) for 2D images, Rodin for 3D models |