Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

[Preprint]. 2024 Aug 19:2024.08.14.607690. [Version 1] doi: 10.1101/2024.08.14.607690

PMC11370360.1; 2024 Aug 19
PMC11370360.2; 2024 Sep 20

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.

PMC Copyright notice

Fig. 1. — a. Schematic of the brain-to-voice neuroprosthesis. Neural features extracted from four chronically implanted microelectrode arrays are decoded in real-time and used to directly synthesize his voice. b. Array locations on the left hemisphere and typical neuronal spikes from each microelectrode recorded over 1s. Color overlays are estimated from a Human Connectome Project cortical parcellation. c. Closed-loop causal voice synthesis pipeline: voltages are sampled at 30 kHz; threshold-crossings and spike-band power features are extracted from 1ms segments; these features are binned into 10 ms non-overlapping bins, normalized and smoothed. The Transformer model maps these neural features to a low-dimensional representation of speech involving Bark-frequency cepstral coefficients, pitch, and voicing, which are used as input to a vocoder. d. Lacking T15’s ground truth speech, we first generated synthetic speech from the known text cue in the training data using text-to-speech, and then used the neural activity itself to time-align the synthetic speech on a syllable level with the neural data time-series to obtain a target speech waveform. e. Representative example causally synthesized neural data, which matches the target speech with high fidelity.