Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

[Preprint]. 2024 Sep 20:2024.08.14.607690. Originally published 2024 Aug 19. [Version 2] doi: 10.1101/2024.08.14.607690

PMC11370360.1; 2024 Aug 19
PMC11370360.2; 2024 Sep 20

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.

PMC Copyright notice

Fig. 1. — a. Schematic of the brain-to-voice neuroprosthesis. Neural features extracted from four chronically implanted microelectrode arrays were decoded in real-time and used to directly synthesize voice. b. Array locations on the participant’s left hemisphere and typical neuronal action potentials from each microelectrode. Color overlays are estimated from a Human Connectome Project cortical parcellation. c. Closed-loop causal voice synthesis pipeline: voltages were sampled at 30 kHz; threshold-crossings and spike-band power features were extracted from 1 ms segments; these features were binned into 10 ms non-overlapping bins, normalized and smoothed. The Transformer-based decoder mapped these neural features to a low-dimensional representation of speech involving Bark-frequency cepstral coefficients, pitch, and voicing, which were used as input to a vocoder. The vocoder then generated speech samples which were continuously played through a speaker. d. Lacking T15’s ground truth speech, we first generated synthetic speech from the known text cue in the training data using a text-to-speech algorithm, and then used the neural activity itself to time-align the synthetic speech on a syllable-level with the neural data time-series to obtain a target speech waveform for training the decoder. e. A representative example of causally synthesized speech from neural data, which matches the target speech with high fidelity.