Skip to main content
[Preprint]. 2024 Sep 20:2024.08.14.607690. Originally published 2024 Aug 19. [Version 2] doi: 10.1101/2024.08.14.607690

Extended Data Fig. 2: Latencies of closed-loop brain-to-voice synthesis.

Extended Data Fig. 2:

Cumulative latencies across different stages in the voice synthesis and audio playback pipeline are shown. Voice samples were synthesized from raw neural activity measurements within 10 ms and the resulting audio was played out loud continuously to provide closed-loop feedback. Note the linear horizontal axis is split to expand the visual dynamic range. We focused our engineering primarily on reducing the brain-to-voice inference latency, which fundamentally bounds the speech synthesis latency. As a result, the largest remaining contribution to the latency occurred after voice synthesis decoding during the (comparably more mundane) step of audio playback through a sound driver. The cumulative latencies with the audio driver settings used for T15 closed-loop synthesis are shown in dark gray. Audio playback latencies were subsequently substantially lowered through software optimizations (light gray) and we predict that further reductions will be possible with additional computer engineering.