Protocol to perform offline ECoG brain-to-text decoding for natural tonal sentences

Daohan Zhang; Zhenjie Wang; Youkun Qian; Zehao Zhao; Yan Liu; Junfeng Lu; Yuanning Li

doi:10.1016/j.xpro.2025.103650

. 2025 Feb 21;6(1):103650. doi: 10.1016/j.xpro.2025.103650

Protocol to perform offline ECoG brain-to-text decoding for natural tonal sentences

Daohan Zhang ^1,^2,^3,¹⁰, Zhenjie Wang ^4,¹⁰, Youkun Qian ^1,^2,³, Zehao Zhao ^1,², Yan Liu ^1,², Junfeng Lu ^1,^2,^3,^5,^6,^∗, Yuanning Li ^3,^4,^7,^8,^9,^11,^12,^∗∗

PMCID: PMC11904493 PMID: 39985774

Summary

Here, we present a protocol to decode Mandarin sentences from invasive neural recordings using a brain-to-text framework. We describe steps for preparing materials, including designing the sentence corpus and setting up electrocorticography (ECoG) recording systems. We then detail procedures for decoding, such as data preprocessing, selection of speech-responsive electrodes, speech detection, syllable and tone decoding, and language modeling. We also outline performance evaluation metrics.

For complete details on the use and execution of this protocol, please refer to Zhang et al.¹

Subject areas: Neuroscience, Cognitive Neuroscience, Computer sciences

Graphical abstract

Highlights

•
Instructions for designing appropriate language tasks for training
•
Steps for constructing a language model converting tonal syllables into sentences
•
Guidance on identifying speech-responsive and discriminative electrodes
•
Procedures for implementing multi-stream neural decoders to form a decoding pipeline

Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.

Before you begin

Protocol overview

The protocol described below outlines the steps for decoding Mandarin sentences from invasive neural recordings using a brain-to-text framework. This protocol was specifically designed for use with electrocorticography (ECoG) recordings from patients with eloquent brain tumors who underwent awake surgery. However, the framework can be adapted for use with other types of neural recordings and patient populations.

Materials and equipment setup

Please refer to the materials and equipment section for a list of necessary materials and equipment.

Preparation

Before beginning the protocol, ensure that you have:

1.
Obtain approval from the relevant institutional review board and informed consent from participants.
2.
Prepare the necessary materials and equipment, including ECoG recording equipment and software for data analysis.
3.
Familiarize yourself with the brain-to-text framework and the specific steps outlined in this protocol.

Note: This protocol involves working with human participants and invasive neural recordings. It is essential to follow all relevant guidelines and regulations to ensure participant safety and data integrity.

Optional: If you plan to use this protocol with a different patient population or type of neural recording, you may need to modify the protocol accordingly. Please consult with relevant experts and ensure that you have obtained necessary approvals and permissions before proceeding.

Pause point: The protocol can be paused at various points, including after data collection and before data analysis. Please ensure that you have properly stored and secured all data and materials before pausing the protocol.

CRITICAL: It is essential to follow proper protocols for handling and storing neural recordings to ensure data integrity and participant safety.

Institutional permissions

All experiments were performed in accordance with relevant institutional and national guidelines and regulations. Huashan Hospital Institutional Review Board of Fudan University (HIRB, KY2019-538) and the Institutional Review Board of ShanghaiTech University (ShanghaiTech SBE IRB#2022-003) approved this protocol. Participation in the study was voluntary, and participants had the right to withdraw at any time. Privacy and confidentiality were strictly protected. There were no additional costs or compensation for participants. The principal investigators and the contact information for the ethics committee and emergency situations were provided to the participants.

Note: Readers need to acquire permissions from the relevant institutions and obtain approval from their own institutional review board before conducting similar experiments. It is essential to ensure that all experiments conform to the relevant regulatory standards and guidelines.

Experimental material preparation

Timing: 2–3 weeks total

4.
Design and Construct the Sentence Corpus (1–2 weeks).
- a.
  Select the top 10 most frequently used open syllables with monophthong from the target language database (e.g., Center for Chinese Linguistics (CCL) PKU Corpus for Mandarin Chinese).²
- b.
  Generate 40 distinct characters by combining the 10 syllables with 4 lexical tones.
- c.
  Construct 29 Chinese words and phrases using the 40 characters.
- d.
  Create 10 complete sentences (3–4 phrases, accordingly 5–8 Chinese characters per sentence) using the 29 phrases.

CRITICAL: The selected syllables should cover more (approximately 25% of all characters) elements in common usage to ensure representativeness.

Note: The corpus design can be adapted for different tonal languages while maintaining similar structural principles. Additionally, based on the actual feasible recording time, this experimental material is designed for a 30-minute session. Each syllable needs at least 45–60 repetitions to ensure high decoding accuracy. If more time is available, the size of the corpus can be correspondingly increased.

5.
Prepare Visual Presentation Materials (3–5 days).
- a.
  Create slide presentations for each trial containing:
  - i.
    A black fixation cross on white background (optional, we use 30 s duration for a short rest and preparation).
  - ii.
    A gray fixation cross (>2s duration as a reminder, we use 3s in our design).
  - iii.
    The target sentence in gray text.
  - iv.
    Sequential highlighting of individual characters in black.
    CRITICAL: Ensure the visual cue timing allows for the participant to produce each character separately and clearly. (>0.8 s per character for most participants, we use 1.2 s per character in our design).
- b.
  Program inter-stimulus intervals:
  - i.
    Set intervals between sentences (>2s duration for a short breath, we use 3s in our design).
  - ii.
    Set intervals between trials (>2s duration for a short breath, we use 3s in our design).
    Note: The timing should not be a forcing strict alignment for participants in recording setting.
6.
Organize Trial Structure (2–3 days).
- a.
  Arrange 4 blocks of trials:
  - i.
    3 optimization blocks (Trials 1–12).
  - ii.
    1 evaluation block (Trials 13–16).
- b.
  Randomize sentence order within each trial.
- c.
  Program each trial to present all 10 sentences once.

CRITICAL: The number of trials and blocks should be adjusted based on specific experimental requirements and patient conditions.

7.
Test and Validate Materials (2–3 days).
- a.
  Verify comfort display timing of visual cues.
- b.
  Test randomization of sentence presentation.
- c.
  Confirm proper recording of acoustic trial markers.
- d.
  Validate synchronization between visual cues and data acquisition systems.

Participant selection, consent, and task familiarization

Timing: 2–3 weeks total

8.
Screen and Select Participants (1–2 weeks).
- a.
  Inclusion Criteria:
  - i.
    Native speakers.
  - ii.
    Adults with eloquent brain tumors or epilepsy requiring awake surgery.
  - iii.
    No other major medical conditions.
  - iv.
    No cognitive or neurological deficits.
  - v.
    Age range: 18–70 years old.
  - vi.
    Able to perform speech tasks.
  - vii.
    No speech/language disorders.
- b.
  Exclusion Criteria:
  - i.
    Non-native speakers.
  - ii.
    Significant neurological deficits.
  - iii.
    Unable to cooperate with task requirements, including participants unable to recognize visual cues and words on the portable screen due to impaired sightedness.
  - iv.
    Severe medical conditions.
  - v.
    Contraindications for awake surgery.
  - vi.
    Speech/language disorders.
- c.
  Document demographic information (age, gender, dialect background).
- d.
  Confirm absence of speech/language disorders.

CRITICAL: All procedures follow standard clinical protocols. Patients’ safe and health prior at any step.

9.
Medical Evaluation and Surgical Planning (3–5 days).
- a.
  Have experienced neurosurgeon available for grid placement.
- b.
  Perform preoperative computer tomography (CT) and T1 magnetic resonance imaging (MRI) scanning.
- c.
  Double check:
  - i.
    Lesion location.
  - ii.
    Required surgical exposure meets the criteria.
  - iii.
    Contraindications for awake surgery.
- d.
  Determine grid location based on exposure and tumor location.
10.
Obtain Informed Consent (1–2 days).
- a.
  Submit protocol for Huashan Hospital IRB approval (Protocol #KY2019-538).
- b.
  Provide detailed study information to participants.
  - i.
    Clarify study procedures and risks.
  - ii.
    Explain voluntary participation and privacy protection.
- c.
  Obtain written informed consent prior to surgery.
- d.
  Document all approvals and consent forms.
11.
Task Familiarization and Training (1–2 days).
- a.
  Explain task procedures and visual cue system.
- b.
  Demonstrate proper timing requirements:
  - i.
    1.2 s per character display.
  - ii.
    3s intervals between sentences.
- c.
  Practice sample sentences following visual cues.
- d.
  Verify participant’s comfort with task pacing.

CRITICAL: Ensure participants understand the importance of following visual cues for proper timing.

Note: The exact timing may vary depending on institutional procedures and participant availability. All data collection should follow standard clinical procedures with strict privacy protection.

Key resources table

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Software and algorithms

Code for data preprocessing and statistical analysis	This paper	https://github.com/yuanningli/tonal_BCI_decoding
PRAAT (version 6.1.01)	Paul Boersma and David Weenink Phonetic Sciences, University of Amsterdam	https://github.com/praat/praat
FreeSurfer (v.7.4.1)	Laboratory for Computational Neuroimaging, the Athinoula A. Martinos Center for Biomedical Imaging.	https://surfer.nmr.mgh.harvard.edu/
img_pipe (v.2.0)	Chang Lab, UCSF	https://github.com/ChangLabUcsf/img_pipe
Synapse – Neurophysiology Suite	Tucker-Davis Technologies	https://www.tdt.com/component/synapse-software/

Other

Tucker-Davis Technologies ECoG system	Tucker-Davis Technologies	https://www.tdt.com/docs/hardware/pz5m-medically-isolated-neurodigitizer/#pz5m-touchscreen
Cortac 128 high-density electrode array	PMT Corporation	http://www.pmtcorp.com/cortac.html
Portable screen	Sinorad Corporation	http://www.sinorad.cn/pro-show-137.html

Open in a new tab

Materials and equipment

•
ECoG Recording System.
- ○
  Tucker-Davis Technologies ECoG system.
  - -
    ECoG sampling rate: >400 Hz for later filtering and Hilbert transformation (We use 3052 Hz option in TDT device).
  - -
    Audio sampling rate: >16000 Hz for accurate transcription and annotation (We use 24414 Hz option in TDT device).
    CRITICAL: Data collection systems must be synchronized for temporal alignment and regular calibration of recording equipment required.
- ○
  Two 128 high-density electrode arrays per participant.
  CRITICAL: ECoG must be sterilized using ethylene oxide according to standard surgical protocols.
- ○
  Mounted microphone for simultaneous audio recording.
•
Computational Infrastructure.
- ○
  A high-performance computing server consisting of two Intel Xeon 4314 processors and three A100 80G GPU, along with a storage server with a capacity of 144 TB.
- ○
  BrainLab neuronavigation system for electrode localization and visualization.
- ○
  Hospital web system for data storage and security.
•
Software Requirements.
- ○
  PRAAT (Version 6.1.01) for audio annotation and analysis.
  - -
    Source: https://www.fon.hum.uva.nl/praat/.
- ○
  FreeSurfer for cortical surface reconstruction.³^,⁴
- ○
  PyTorch for deep learning implementation.
- ○
  Custom Python codes for:
  - -
    Electrode visualization.
  - -
    Signal processing.
  - -
    Statistical analysis.
  - -
    Model implementation.
- ○
  Code repository available at: https://doi.org/10.5281/zenodo.13893640.
  CRITICAL: All software versions should be documented and maintained consistently. Copy, transferring and deleting any data should strict adherence to hospital data security protocols.
•
Visual Display Setup.

Portable screen⁵ for visual cue presentation. Sliders should run smoothly.
CRITICAL: Portable devices fully charged before the operation.

Alternatives: Backup systems should be available and fully charged.

Step-by-step method details

Data acquisition and signal processing

Timing: 30–40 min per participant session

This major step covers the collection and preprocessing of neural and audio signals during awake surgery, ensuring high-quality data for subsequent analysis. The process includes ECoG recording, audio recording, and initial signal processing steps.

1.
Grid placement and location marking:
- a.
  After the dura suspended and the surface of brain exposed, take intraoperative photo of brain surface before placement of arrays.
- b.
  Place two 128-channel high-density electrode arrays onto ventral sensorimotor cortex (vSMC) under the guidance of intra-operative navigation system.
- c.
  Mark the positions of at least 4 typical electrodes per array on preoperative T1 MRI using BrainLab neuronavigation system.
- d.
  Take another intra-operative photo with electrode arrays for double-checking electrode positions.
- e.
  Document electrode coordinates in standardized space.
  Alternatives: Photos taken by camera or microscope before and after placement of grids over brain surface could be an alternative for localization of grids. But these could not replace the role of navigation system in clinical procedure.
2.
Set up ECoG recording system:
- a.
  Connect arrays to Tucker-Davis Technologies ECoG system.
- b.
  Set sampling mode to ECoG and sampling rate to 3052 Hz.
- c.
  Mount microphone for concurrent audio recording at 24414 Hz.
3.
Perform visual and quantitative signal inspection:
- a.
  Turn on “IDLE” mode, inspect ECoG signals on each channel. Troubleshooting 1.
- b.
  Identify and exclude bad channels with artifacts or excessive noise. Troubleshooting 2.
4.
Record audio and synchronize:
- a.
  Set up mounted microphone for simultaneous recording.
- b.
  Connect microphone to Tucker-Davis Technologies ECoG system.
- c.
  Check proper synchronization between ECoG and audio recordings.
5.
Neural data recording.
- a.
  Move the portable screen to 8–10 cm in front of participant’s eyes. Wake up the participant. Adjust the position, angle and lightness of the screen, until he/she can recognize visual cue and sentences clearly and comfortably.
- b.
  Switch to “RECORD” mode. Start presenting slides on the portable screen and instruct participants to perform language task.⁵
- c.
  Verify audio and ECoG waveform quality throughout recording session.
- d.
  Carefully examine the correlation patterns and waveforms to identify potential contamination sources. Genuine neural signals should show appropriate and channel-specific physiological characteristics distinct from direct acoustic pickup. Troubleshooting 3.
- e.
  After finishing each block, stop the recording and instruct the participant to rest for a while. Reemphasize the participants to report any uncomfortable immediately. Troubleshooting 4.
6.
Process ECoG signals:
- a.
  Down sample ECoG signals to 400 Hz.
- b.
  Extract high-gamma (HG, 70–150 Hz) frequency component via Hilbert transform.
- c.
  Verify signal quality across all channels.

CRITICAL: All equipment may get into the sterilization field of the surgery must be sterilized according to standard surgical protocols before placement.

Note: Signal quality should be continuously monitored throughout the recording session to ensure data integrity.

Phonetic and phonological annotation

Timing: 4–5 h per participant session

This step involves the manual annotation and transcription of audio recordings to create accurate labels for the neural decoding models. The transcription includes monosyllabic Chinese characters, syllables, and tone labels at the syllable level.

7.
Audio recording preparation:
- a.
  Open audio recordings in Praat software (Version 6.1.01).
- b.
  Exclude any unexpected voicing unrelated to language tasks.
- c.
  Segment recordings into individual syllables.
- d.
  Segment each sentences.
  CRITICAL: Unexpected voicing unrelated to language task (such as communication with clinicians and heavy sighing) must be carefully excluded from samples used in training models.
8.
Manual annotation:
- a.
  Label each syllable with label and corresponding Chinese character.
- b.
  Mark tone category (1–4) for each syllable.
- c.
  Document syllable boundaries and timing information.
- d.
  Verify fidelity to participant’s actual vocalizations.
9.
Data organization:
- a.
  Create structured annotation files (.Textgrid file).
- b.
  Cross-reference annotations with audio timestamps.
- c.
  Prepare labels for model training.

CRITICAL: Annotations of one participant must be performed by one native Mandarin speakers to ensure consistent and accurate tone and syllable identification. Ensure proper alignment of neural signals with speech events.

Note: For whisper participants without recognizable vocal pitch, mark the tone category based on the standard pronunciation.

Electrode categorization

Timing: 10 min per participant

This protocol identifies electrodes that show significant responses during speech production and further classifies them into tone-discriminative and syllable-discriminative categories through statistical testing.

10.
Identify speech-responsive electrodes:
- a.
  Define baseline window as −1800 ms to −400 ms before sentence onset.
- b.
  Define test window as −400 ms–800 ms relative to consonant onsets.
- c.
  Perform two-sample t-test comparing HG values across all trials at each time point of test window against all HG values in all trials fall in baseline window of each electrode.
- d.
  Apply Bonferroni correction for multiple comparisons across:
  - i.
    Total number of electrodes.
  - ii.
    All time points.
- e.
  Mark electrode as speech-responsive if results are significant (P < 0.01) for 40 consecutive time points (100 ms). Troubleshooting 5.
11.
Identify tone-discriminative electrodes:
- a.
  For each electrode, align HG responses with syllable onsets.
- b.
  Define analysis window from −500 ms to 500 ms relative to onset (400 time points).
- c.
  Perform one-way ANOVA to assess differences across 4 Mandarin tones at each timestep.
- d.
  Apply Bonferroni correction for electrodes and time points (P < 0.05).
- e.
  Mark electrode as tone-discriminant if >50% of time points are significant within any 200 ms window.
12.
Identify syllable-discriminative electrodes:
- a.
  For each electrode, align HG responses with syllable onsets.
- b.
  Define analysis window from – 400 ms to 800 ms relative to onset (480 time points).
- c.
  Perform one-way ANOVA to assess differences across 10 Mandarin syllables at each timestep.
- d.
  Apply same significance criteria as tone-discriminant selection.
- e.
  Mark electrode as syllable-discriminant if >50% of time points are significant within any 200 ms window.

CRITICAL: Only neural data of training blocks can be used for electrodes selecting to avoid potential overfitting.

Note: One electrode may belong to multiple categories (speech-responsive, tone-discriminant, and/or syllable-discriminant) based on their response properties.

The approximate computation time is calculated using two Intel Xeon 4314 processors.

Cortical surface reconstruction and electrode visualization

Timing: 8–10 h per participant

13.
Surface Reconstruction:
- a.
  Convert MRI T1 .dcm files to .nii file.
- b.
  Use FreeSurfer software to³:
  - i.
    Reconstruct individual cerebral surface from .nii file.
  - ii.
    Generate anatomical labeling.
  - iii.
    Create 3D cortical models.⁴
- c.
  Apply img_pipe⁶ and customized Python codes for:
  - i.
    Surface rendering.
  - ii.
    Electrode position based on electrode coordinates and intra-operative photos.⁷
  - iii.
    Grid visualization and inspection by a neurosurgeon.
  - iv.
    Anatomical mapping.

CRITICAL: Ensure accurate alignment between MRI and intraoperative photos.

14.
Visualization:
- a.
  Plot electrode using colored discs to indicate categories:
  - i.
    Yellow: speech-responsive electrodes.
  - ii.
    Red: tone-discriminative electrodes.
  - iii.
    Blue: syllable-discriminative electrodes.
  - iv.
    Mixed colors: electrodes with combined features.
  - v.
    Small black dots: nonresponsive electrodes.
    CRITICAL: Verify anatomical labeling accuracy, especially cortices near the lesion.
- b.
  Generate Venn diagrams showing:
  - i.
    Number of electrodes in each category.
  - ii.
    Overlapping features between categories.
- c.
  Create hemisphere-specific visualizations:
  - i.
    Plot left or right hemisphere coverage.
  - ii.
    Map electrode across participants onto MNI152 brain template. Troubleshooting 6.
    Note: This visualization protocol enables clear representation of electrode placement and functional categorization across participants while maintaining anatomical accuracy.

Speech detector training and hyperparameter tuning

Timing: 3 days per participant

15.
Speech Detector Module.
- a.
  Input: Neural signals from speech-responsive electrodes.
- b.
  Time window: −0.25 s to +0.25 s around each time point.
- c.
  Architecture.⁸
  - i.
    Initial 2D convolution layer with batch normalization.
  - ii.
    Max-pooling for temporal down sampling.
  - iii.
    Bidirectional GRU layers.
  - iv.
    Fully connected layer outputting speech/rest probabilities.
- d.
  Initialized model save as .pt file.
  class CRNN(nn.Module):
  
  def __init__(self, ∗, duration, typeNum, in_chans,
  
       num_layers=4, gruDim=256, drop_out=0.5):
  
    super().__init__()
  
    self.conv1d = nn.Conv1d(in_channels=in_chans, out_channels=gruDim, kernel_size=3, stride=1, padding=0)
  
    self.leaky_relu = nn.LeakyReLU(negative_slope=0.01)
  
    self.max_pooling = nn.MaxPool1d(kernel_size=2, stride=None, padding=0)
  
    self.dropout = nn.Dropout(p=drop_out)
  
    gru_layers = []
  
    for i in range(num_layers):
  
     if i == 0:
  
      gru_layers.append(nn.GRU(gruDim, gruDim, 1, batch_first=True, bidirectional=True))
  
     else:
  
      gru_layers.append(nn.GRU(gruDim ∗ 2, gruDim, 1, batch_first=True, bidirectional=True))
  
    # Create the sequential model with stacked GRU layers
  
    self.gru_layers = nn.Sequential(∗gru_layers)
  
    elec_feature = int(2∗gruDim)
  
    self.fc1 = nn.Linear(elec_feature, typeNum)
  
  def forward(self, x):
  
    x = self.conv1d(x)
  
    x = self.leaky_relu(x)
  
    x = self.max_pooling(x)
  
    x = rearrange(x,'batch electrodes duration -> batch duration electrodes')
  
    for gru_layer in self.gru_layers:
  
     x, _ = gru_layer(x)
  
     x = self.dropout(x)
  
    x = x[:, -1, :]
  
    x = self.fc1(x)
  
    return x
  
  #Generate&save Speech detector
  
  overt_back = 0.25
  
  overt_forward = 0.25
  
  hz = 400
  
  model1 = CRNN(duration=int((overt_back+overt_forward)∗hz), typeNum=2, in_chans=len(resp_elecs), gruDim=256).to(device)
  
  torch.save(model1,("./"+subject+'_onset.pt'))
16.
Training Process.
- a.
  Batch size: 1024.
- b.
  Optimizer: Adam (learning rate: 0.001).
- c.
  Loss: Weighted cross-entropy based on speech/rest ratio.
- d.
  Early stopping:
  - i.
    After 10 epochs without improvement.
  - ii.
    Maximum 50 epochs total.
17.
Post-processing Pipeline.
- a.
  Probability smoothing:
  - i.
    Apply sliding window of size S.
  - ii.
    Smooth decoded probability time course.
- b.
  Thresholding:
  - i.
    Apply probability threshold (Pt) to get binary values.
  - ii.
    Set speech = 1, rest = 0.
- c.
  Time constraints:
  - i.
    Apply off-time threshold (Toff) for minimum silent period.
  - ii.
    Apply on-time threshold (Ton) for minimum speech period.
- d.
  Error permissive rate (EPR):
  - i.
    Allow small portion of incorrect predictions.
  - ii.
    Prevent multiple onsets within 0.5 s.
- e.
  Save decoded onsets as .mat file.
18.
Hyperparameter Optimization.
- a.
  Use Hyperopt Python package for optimization.⁹ Iterate 500 times per participants.
- b.
  Parameters to optimize:
  - i.
    Smoothing window size (S).
  - ii.
    Probability threshold (Pt).
  - iii.
    Off-time threshold (Toff).
  - iv.
    On-time threshold (Ton).
  - v.
    Error permissive rate (EPR).
- c.
  Validation (Compare predicted onsets with ground truth):
  - i.
    Six-fold cross-validation for training and testing process.
  - ii.
    Generate a zero-vector with length equal to the number of slide windows sent into the AI module.
  - iii.
    Set the values in the vector corresponding to 0.5 s peri-onset time points from 0 to 1.
  - iv.
    Calculate the loss value between vector generated from ground truth and another one generated from predicted onsets. This loss guides following grid-search iteration.

Note: This framework enables robust detection of speech onsets from continuous neural recordings while accounting for various temporal constraints and potential errors in the prediction process.

The approximate computation time is calculated using two Intel Xeon 4314 processors and one NVIDIA A100 80G GPU.

Tone and syllable decoder training and hyperparameter tuning

Timing: 3 days per participant

This step details the construction and optimization of neural network decoders for tone and syllable prediction from ECoG signals. The process includes model architecture setup and hyperparameter tuning using cross-validation approaches.

19.
Decoder Architecture Setup.
- a.
  Input processing:
  - i.
    Tone decoder: Use 0.2 s before to 0.6 s after manually aligned speech onset.
  - ii.
    Syllable decoder: Use 0.4 s before to 0.8 s after manually aligned speech onset.
- b.
  Create ensemble of 10 neural networks with identical architecture but different training/validation splits.
- c.
  Implement network layers:
  - i.
    Initial 2D convolution with batch normalization.
  - ii.
    ELU activation function.
  - iii.
    Max-pooling for temporal down sampling.
  - iv.
    Bidirectional GRU layers.
  - v.
    Fully connected output layer.
- d.
  Initialized model save as .pt file.

class timespatCNNRNN(nn.Module):

def __init__(self, ∗, duration, typeNum, in_chans,

n_filters_time,filter_time_length,n_filters_spat,conv_stride,

pool_time_length,pool_stride,n_filters,filter_length,

n_CNN_layer,gruDim,gruLayer,drop_out):

super().__init__()

self.conv_timespat = nn.Conv2d(

1, n_filters_spat, (filter_time_length, in_chans), stride=(conv_stride, 1),)

self.bnorm = nn.BatchNorm2d(n_filters_spat, affine=True, eps=1e-5)

self.elu = nn.ELU()

self.pool = nn.MaxPool2d(kernel_size=(pool_time_length, 1), stride=(pool_stride, 1))

self.conv_pool_block = nn.ModuleList()

self.conv_pool_block.append(nn.Dropout(p=drop_out))

self.conv_pool_block.append(nn.Conv2d(

n_filters_spat, n_filters, (filter_length, 1),

stride=(conv_stride, 1), padding=(((filter_length - 1) ∗ conv_stride) // 2,0)

))

for i in range(n_CNN_layer-1):

self.conv_pool_block.append(nn.Dropout(p=drop_out))

self.conv_pool_block.append(nn.Conv2d(

n_filters, n_filters, (filter_length, 1),

stride=(conv_stride, 1), padding=(((filter_length - 1) ∗ conv_stride) // 2,0)

))

self.conv_pool_block.append(nn.BatchNorm2d(n_filters, momentum=0.1, affine=True, eps=1e-5,))

self.conv_pool_block.append(nn.ELU())

self.conv_pool_block.append(nn.MaxPool2d(kernel_size=(pool_time_length, 1), stride=(pool_stride, 1),))

self.gru1 = nn.GRU(n_filters, gruDim, gruLayer, batch_first=True, bidirectional=True)

elec_feature = int(2∗gruDim)

self.fc1 = nn.Linear(elec_feature, typeNum)

def forward(self, x):

x = rearrange(x,'(batch 1) electrodes duration -> batch 1 duration electrodes')

x = self.conv_timespat(x)

x = self.bnorm(x)

x = self.elu(x)

x = self.pool(x)

for block in self.conv_pool_block:

x = block(x)

x = rearrange(x,'batch filter duration 1 -> batch duration filter')

x = self.gru1(x)[0][:,-1,:]

x = self.fc1(x)

return x

# Generate&save Syllable decoder

sylb_back = 0.4

sylb_forward = 0.8

hz = 400

model2 = timespatCNNRNN(duration=int((sylb_back+sylb_forward)∗hz), typeNum=10, in_chans=len(sylb_elecs), n_filters_time=n_filters, filter_time_length=filter_time_length,

n_filters_spat=n_filters, conv_stride=conv_stride,

pool_time_length=pool_time_length, pool_stride=pool_stride,

n_filters=n_filters, filter_length=filter_length, n_CNN_layer=n_CNN_layer, gruDim=gruDim, gruLayer=gruLayer, drop_out=drop_out).to(device)

torch.save(model2,("./"+subject+'_sylb.pt'))

# Generate&save Tone decoder

tone_back = 0.2

tone_forward = 0.6

hz = 400

model3 = timespatCNNRNN(duration=int((tone_back+tone_forward)∗hz), typeNum=4, in_chans=len(tone_elecs), n_filters_time=n_filters, filter_time_length=filter_time_length,

n_filters_spat=n_filters, conv_stride=conv_stride,

pool_time_length=pool_time_length, pool_stride=pool_stride,

n_filters=n_filters, filter_length=filter_length, n_CNN_layer=n_CNN_layer, gruDim=gruDim, gruLayer=gruLayer, drop_out=drop_out).to(device)

torch.save(model3,("./"+subject+'_tone.pt'))

20.
Training Process Configuration.
- a.
  Set batch size to 4.
- b.
  Configure Adam optimizer with learning rate of 0.001.
- c.
  Implement cross-entropy loss:
  - i.
    Weighted loss for tone decoder.
  - ii.
    Standard loss for syllable decoder (balanced distribution).
- d.
  Set early stopping criteria:
  - i.
    No improvement in validation loss for 50 epochs.
  - ii.
    Maximum 1000 epochs total.
21.
Hyperparameter Optimization.
- a.
  Use Hyperopt Python package for optimization.⁹
- b.
  Parameters to optimize:
  - i.
    Initial convolution filter length (FLini).
  - ii.
    Convolution stride (STconv).
  - iii.
    Max-pooling kernel size (Lpool).
  - iv.
    Other architecture-specific parameters.
- c.
  Evaluation process: Perform six-fold cross-validation and calculate total decoding loss and accuracy.
- d.
  Decoding accuracies guiding grid search of optimized hyperparameters combination. Test 500 different parameter combinations in total.

Note: The optimization process should be performed separately for each participant to account for individual variations in neural patterns and speech characteristics.

The approximate computation time is calculated using one NVIDIA A100 80G GPU.

CRITICAL: Ensure proper electrode selection (tone-discriminant and syllable-discriminant) before model training to optimize decoder performance.

CRITICAL: In the optimization process, modules with different hyperparameters combination should be tested on both manually-aligned and decoded onsets derived from optimized speech detector.

Language model training and integration

Timing: 6–8 min

This step details the construction and optimization of a natural language model to convert decoded tonal syllables into meaningful Chinese sentences. The process includes corpus collection, model training, and integration with the Viterbi decoder for final sentence generation.

22.
Corpus Collection and Preparation.
- a.
  Extract training data from CCL corpus.
- b.
  Measure transfer counts between phrases:
  - i.
    Set cutoff threshold at 512 counts.
  - ii.
    Apply ninth root normalization.
- c.
  Incorporate transfer probabilities from task corpus d. Extract n-grams (n ∈{1,2,3,4} from task corpus phrases.¹⁰
23.
Language Model Training.¹⁰
- a.
  Calculate transition probabilities between Chinese characters.
- b.
  Incorporate contextual information from previous characters.
- c.
  Train model on CCL corpus collection d. Integrate task-specific transfer probabilities.
24.
Viterbi Decoder Implementation.¹⁰
- a.
  Set up decoder to process probability sets:
  - i.
    Input: Tonal syllable likelihoods.
  - ii.
    Output: Chinese character sequences.
- b.
  Configure decoder parameters c. Implement maximum a posteriori probability calculation.
25.
Integration with Neural Decoders.
- a.
  Connect tone decoder output.
- b.
  Connect syllable decoder output.
- c.
  Implement combined probability processing:
  - i.
    Multiple tone decoder output and syllable decoder output and normalized to tonal-syllable probabilities.
  - ii.
    Map tonal-syllable probabilities to Chinese character probabilities. Homophones share averaged probabilities.
- d.
  Generate final sentence predictions from character-wised probabilities using n-gram language model and Viterbi decoder.

CRITICAL: Ensure proper balance between external and task-specific probabilities to avoid overfitting to either dataset.

Note: The language model should account for the one-to-many mapping between tonal syllables and Chinese characters, as many Chinese characters are homophones sharing the same pronunciations. Other language models, such as large language model (LLM), can also be used to translate probabilities of tonal-syllable to final output text.

The approximate computation time is calculated using two Intel Xeon 4314 processors.

Expected outcomes

Implementing this brain-to-text framework can expect several key outcomes.

First, researchers will be able to identify and categorize different types of electrodes based on their responses, including speech-responsive electrodes, tone-discriminative electrodes, and syllable-discriminative electrodes, with some electrodes potentially showing combined features.

Through the optimization process, researchers will obtain optimized hyperparameters for various components of the system, including the speech detection modules, tone decoder, and syllable decoder, which are crucial for accurate decoding.

The framework will generate predicted onset times for speech events, which serve as critical temporal markers for subsequent decoding steps. For each predicted onset time, the system will produce probability distributions for both tones and syllables, reflecting the likelihood of different linguistic elements.

Finally, researchers can expect to obtain the final text output of each decoded sentence, which represents the end result of the complete decoding pipeline after integrating all the individual components and applying language modeling.

These outcomes collectively enable the assessment of the system’s performance at both component and system-wide levels.

Quantification and statistical analysis

1.
We analyze overall decoding performance using Word Error Rates (WERs):
- a.
  WERs are calculated between target and decoded sentences, specifically at the level of 40 Chinese characters rather than 29 Chinese words. This provides a granular assessment of decoding accuracy.
- b.
  To assess model robustness, calculated WERs should compare against permuted (syllable and tone decoders trained with real data but randomly shuffled labels) baselines.
2.
Classification performance of individual modules is evaluated through:
- a.
  Area Under Curve (AUC) for speech detection accuracy.
- b.
  Confusion matrices for tone classification accuracy (across 4 tones).
- c.
  Confusion matrices for syllable classification accuracy (across 10 base syllables).
- d.
  Confusion matrices for combined tonal-syllable accuracy (across 40 possible combinations).
- e.
  Confusion matrices for Chinese character accuracy after language model integration.

The validation framework ensures comprehensive assessment of both individual components (speech detection, tone classification, syllable classification) and the integrated system performance. Results are averaged across multiple trials and participants to account for inter-subject variability.

Limitations

Several key limitations may affect the applicability and generalization of this protocol to clinical settings. First, the study was conducted using overt speech tasks from healthy individuals, which does not fully represent potential users with speech impairments. While audio recordings were not directly used to train the model, the protocol relies on labeling speech pauses from individuals, which would be challenging to implement with fully anarthria patients.¹¹^,¹²^,¹³^,¹⁴ This limitation would require users to speak in a similar halting manner, complicating the protocol’s implementation for individuals with anarthria.

Second, only offline analyses were conducted in this work. Although considerations were made for potential online-decoding challenges (such as using light-weighted models and three separate decoders at the single Chinese character level), the research may not accurately reflect performance in real-time applications. Additional steps would be necessary for implementing this decoding strategy in real-time settings and with unseen sentences.

Third, the protocol has significant vocabulary constraints, utilizing only 40 tonal syllables derived from 4 tones and 10 syllables. While this covers nearly 25% of frequently used Chinese characters, it represents only a small subset of the total 1,664 tonal syllables available in Mandarin. Previous research has indicated that vocabulary size significantly influences speech decoding accuracy, suggesting that maintaining high decoding accuracy with an expanded vocabulary would require substantially more training data.¹²^,¹³^,¹⁴

Finally, the overall accuracy of this protocol, when compared to published results from non-tonal language decoding, is relatively low.¹³^,¹⁴ This limitation may be attributed to both the limited training samples and the constraints of the implemented language model. Additionally, the intraoperative nature of this protocol does not adequately account for potential fluctuations in neural activity signals within the same individual over time, primarily due to the limited sampling duration of approximately 30 min in our design. Variations in daily physical state, mood, electrode displacement within the cortex, and immune responses affecting the electrodes may significantly alter the collected neural signals. These fluctuations may impact the performance and reliability of the decoding framework, raising concerns about its robustness in real-world applications. Therefore, further research is essential to enhance the resilience of the decoding protocol. While adapting this intra-operative and offline protocol to chronic-implant and online decoding one, future efforts should focus on investigating and validating the framework’s resilience to these intra-individual changes to ensure its effectiveness in diverse and dynamic clinical settings.

Troubleshooting

Problem 1

Noises across all channels during ECoG recording (related to Step 3a). Signal quality in ECoG recordings can be compromised by various sources of environmental and physiological interference. These electrical disturbances can manifest as noise across all recording channels, potentially masking or distorting the neural signals of interest. To ensure high-quality recordings and reliable data collection, it is crucial to identify and address potential noise sources through systematic troubleshooting approaches.

Potential solution

•
Environmental noise control: A systematic assessment and control of the recording environment is essential. Power down or switch to low-power mode all nearby medical equipment that could generate electromagnetic interference, including MRI scanners, electrophysiology rigs, surgical navigation systems, operating room lights, and energy platforms. Maintaining distance between the recording setup and potential sources of electrical interference can significantly improve signal quality. Regular monitoring and documentation of environmental conditions can help identify patterns in noise occurrence.
•
Reference electrode optimization: Inadequate patient grounding can result in common-mode interference across all channels. To address this, implement additional reference electrodes by placing needle-type electrodes in the patient’s skull skin and connecting them to the ECoG system’s external reference. This approach can effectively reduce noise by providing a more stable reference point and improving the overall signal-to-noise ratio of the recordings.

Problem 2

Large number of channels without waveforms during ECoG recording (related to Step 3b). During ECoG recordings, the absence of detectable waveforms from multiple electrodes can significantly compromise data quality and experimental outcomes. This issue may arise from poor electrode-tissue contact, improper placement, or electrode malfunction. When a substantial number of electrodes fail to record neural signals, it reduces the spatial coverage of brain activity monitoring and may lead to incomplete or unreliable data collection. Addressing this challenge is crucial for maintaining the integrity and comprehensiveness of neural recordings.

Potential solution

•
Contact optimization: Proper electrode-tissue contact is fundamental for successful signal acquisition. Ensure electrodes are securely positioned against the brain surface. When necessary, use moistened surgical gauze to improve and maintain contact between the electrodes and brain tissue.
•
Impedance verification: Systematic impedance testing of each electrode is crucial for identifying potential connection issues. Using SYNAPSE to measure and document the impedance values for all electrodes, ensure they fall within the manufacturer’s specified range.
•
Grid and cable assessment: If electrodes with high impedance are aligned in a linear pattern, it may indicate a faulty cable connection that requires replacement. Inspect the cables for any signs of wear or damage and replace them as needed to restore proper connectivity. If the high-impedance electrodes are distributed randomly without a discernible pattern, consider replacing the entire electrode grid to ensure consistent and reliable signal acquisition across all channels.

Problem 3

Highly correlation between sound recording and ECoG signals (related to Step 5c).

During ECoG recordings, acoustic contamination of electrophysiological signals is a critical concern that can compromise the validity of neural decoding results. When ECoG signals show high correlation with simultaneous audio recordings, it suggests potential acoustic artifacts rather than genuine neural activity. This contamination can occur through mechanical coupling or electrical interference, leading to false interpretations of neural responses to speech tasks. As evidenced in the context, this issue requires careful validation to ensure the decoded signals truly represent neural activity rather than acoustic artifacts.

Potential solution

•
Correlation analysis: Calculate correlations between power in ECoG frequency bands (81 bands linearly distributed from 0 to 400 Hz) and power in sound frequency bands (81 bands linearly distributed from 0 to 400 Hz) using short time Fourier transform (STFT) analysis. This systematic approach helps identify any suspicious correlations across different frequency ranges.¹⁵^,¹⁶
•
Time-lag assessment: Plot the maximum correlation across frequency bands of sound and ECoG signals against different time lags between them. The resulting curve shows how correlation changes as ECoG signals are shifted in time relative to sound signals, helping distinguish true neural responses from acoustic contamination.¹⁵^,¹⁶
•
Channel exclusion: If specific channels show persistent high correlations with acoustic signals, consider excluding them from further analysis to maintain data integrity and ensure decoded signals represent true neural activity.

Problem 4

Participants experience thirst during the speech task (related to Step 5d).

During completing the speech task, participants may report thirst, which can lead to discomfort, or potential complications such as vomiting or aspiration if they drink normal amounts of water intraoperatively. Managing hydration carefully is essential to ensure participant comfort and safety while maintaining the integrity of the speech task recordings.

Potential solution

•
Lip hydration: Regularly check the moisture level of the participant’s lips. Use a cotton swab moistened with drinkable water to gently wipe their lips, providing temporary relief from dryness without significant water intake. This method helps maintain comfort while minimizing the risk of excessive fluid consumption.
•
Controlled oral hydration: Administer small amounts of drinkable water, approximately 5–10 mL, using a syringe to moisten the buccal mucosa. This approach allows for controlled hydration, reducing the risk of vomiting or aspiration while alleviating thirst. Ensure that the participant remains comfortable and monitor their response to the hydration method.
•
Task facilitation: Encourage participants to continue with the speech task by adjusting the microphone placement closer to them. Instruct participants that they do not need to speak loudly if the sound recording is already clear and recognizable. This adjustment can help reduce strain and discomfort, allowing participants to complete the task more comfortably.

Problem 5

Number of speech-responsive electrodes is less than 25% (related to Step 10e).

Low percentage of speech-responsive electrodes can significantly impact the quality and reliability of neural decoding. This issue may arise from various factors including electrode placement, signal quality, and identification criteria. Speech-responsive electrodes are crucial for detecting speech onset/offset and decoding speech elements, so having too few responsive electrodes could compromise the overall performance of the brain-to-text framework.

Potential solution

•
Segment of syllable is correct: Ensure accurate manual annotation at the syllable level using Praat (Version 6.1.01) by native speakers who can properly identify syllable boundaries in Mandarin Chinese speech. Continuous or overlapping pronunciation must be excluded.
•
Unrelated voicing is removed: Systematically exclude unexpected voicing unrelated to language tasks (such as communication with clinicians) from samples used in training models to maintain data quality.
•
Double-check Praat annotation is done by one person (ensuring one criteria to judge border of onsets from spectrogram or listening): Having consistent annotation criteria is crucial as evidenced by the context’s emphasis on manual annotation by a single native speaker to ensure fidelity to participants' actual vocalizations.
•
Check noises from spectrogram or raw traces of ECoG. Discard clips polluted by these noises.

Problem 6

Fail to map electrodes onto MNI152 brain template (related to Step 14c ii).

Potential solution

Exclude electrodes with unrecognizable anatomical labels in previous process (Step 11c iv).

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Yuanning Li (liyn2@shanghaitech.edu.cn).

Technical contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the technical contact, Yuanning Li (liyn2@shanghaitech.edu.cn).

Materials availability

This study did not generate unique reagents.

Data and code availability

•
The raw datasets supporting the current study have not been deposited in a public repository because it contains personally identifiable patient information, but are available in an anonymized form from the lead contact on reasonable request.
•
All original code to replicate the main findings of this study can be found at https://doi.org/10.5281/zenodo.13893640.
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

Acknowledgments

Y. Li is supported by the Lingang Laboratory under grant LG-GG-202402-06, the National Natural Science Foundation of China General Program (32371154), Shanghai Rising-Star Program (24QA2705500), and Shanghai Pujiang Program (22PJ1410500). J.L. is supported by STI 2030-Major Projects (2022ZD0212300) and the National Natural Science Foundation of China General Program (32371146). This project is also supported by the Innovation Program of Shanghai Municipal Education Commission (2023ZKZD13) and Fuqing Program of Shanghai Medical School of Fudan University. The computations in this research were performed using the CFFF platform of Fudan University and supported by the HPC Platform of ShanghaiTech University. We thank the Medical Science Data Center of Fudan University for the data analysis support.

Author contributions

Y. Li and J.L. conceived and supervised the project. Y. Li, J.L., D.Z., Y. Liu, and Z.Z. designed the experiment. J.L., D.Z., Y.Q., and Z.Z. collected the data. Y. Li and D.Z. designed the neural network. Y. Li, Z.W., and D.Z. designed the language model. D.Z., Z.W., Y. Li, and J.L. wrote and revised the manuscript. All authors reviewed and approved the manuscript.

Declaration of interests

The authors declare no competing interests.

Contributor Information

Junfeng Lu, Email: junfeng_lu@fudan.edu.cn.

Yuanning Li, Email: liyn2@shanghaitech.edu.cn.

References

1.Zhang D., Wang Z., Qian Y., Zhao Z., Liu Y., Hao X., Li W., Lu S., Zhu H., Chen L., et al. A brain-to-text framework for decoding natural tonal sentences. Cell Rep. 2024;43 doi: 10.1016/j.celrep.2024.114924. [DOI] [PubMed] [Google Scholar]
2.Zhan W., Guo R., Chang B., Chen Y., Chen L. The building of the CCL corpus: its design and implementation. Yuliaoku Yuyanxue. 2019;1:71–86. [Google Scholar]
3.Desikan R.S., Ségonne F., Fischl B., Quinn B.T., Dickerson B.C., Blacker D., Buckner R.L., Dale A.M., Maguire R.P., Hyman B.T., et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage. 2006;31:968–980. doi: 10.1016/j.neuroimage.2006.01.021. [DOI] [PubMed] [Google Scholar]
4.Fischl B., Van Der Kouwe A., Destrieux C., Halgren E., Ségonne F., Salat D.H., Busa E., Seidman L.J., Goldstein J., Kennedy D., et al. Automatically parcellating the human cerebral cortex. Cereb. Cortex. 2004;14:11–22. doi: 10.1093/cercor/bhg087. [DOI] [PubMed] [Google Scholar]
5.Hameed N.U.F., Zhao Z., Zhang J., Bu L., Zhou Y., Jin L., Bai H., Li W., Tang J., Lu J., et al. A novel intraoperative brain mapping integrated task-presentation platform. Oper. Neurosurg. 2021;20:477–483. doi: 10.1093/ons/opaa476. [DOI] [PubMed] [Google Scholar]
6.Hamilton L.S., Chang D.L., Lee M.B., Chang E.F. Semi-automated anatomical labeling and inter-subject warping of high-density intracranial recording electrodes in electrocorticography. Front. Neuroinform. 2017;11:62. doi: 10.3389/fninf.2017.00062. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Bouchard K.E., Mesgarani N., Johnson K., Chang E.F. Functional organization of human sensorimotor cortex for speech articulation. Nature. 2013;495:327–332. doi: 10.1038/nature11911. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Lawhern V.J., Solon A.J., Waytowich N.R., Gordon S.M., Hung C.P., Lance B.J. EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces. J. Neural. Eng. 2018;15 doi: 10.1088/1741-2552/aace8c. [DOI] [PubMed] [Google Scholar]
9.Bergstra J., Komer B., Eliasmith C., Yamins D., Cox D.D. Hyperopt: a python library for model selection and hyperparameter optimization. Comput. Sci. Discov. 2015;8 [Google Scholar]
10.Moses D.A., Metzger S.L., Liu J.R., Anumanchipalli G.K., Makin J.G., Sun P.F., Chartier J., Dougherty M.E., Liu P.M., Abrams G.M., et al. Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria. N. Engl. J. Med. 2021;385:217–227. doi: 10.1056/NEJMoa2027540. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Moses D.A., Metzger S.L., Liu J.R., Anumanchipalli G.K., Makin J.G., Sun P.F., Chartier J., Dougherty M.E., Liu P.M., Abrams G.M., et al. Neuroprosthesis for decoding speech in a paralyzed person with anarthria. N. Engl. J. Med. 2021;385:217–227. doi: 10.1056/NEJMoa2027540. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Willett F.R., Kunz E.M., Fan C., Avansino D.T., Wilson G.H., Choi E.Y., Kamdar F., Glasser M.F., Hochberg L.R., Druckmann S., et al. A high-performance speech neuroprosthesis. Nature. 2023;620:1031–1036. doi: 10.1038/s41586-023-06377-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Card N.S., Wairagkar M., Iacobacci C., Hou X., Singer-Clark T., Willett F.R., Kunz E.M., Fan C., Vahdati Nia M., Deo D.R., et al. An Accurate and Rapidly Calibrating Speech Neuroprosthesis. N. Engl. J. Med. 2024;391:609–618. doi: 10.1056/NEJMoa2314132. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Metzger S.L., Littlejohn K.T., Silva A.B., Moses D.A., Seaton M.P., Wang R., Dougherty M.E., Liu J.R., Wu P., Berger M.A., et al. A high-performance neuroprosthesis for speech decoding and avatar control. Nature. 2023;620:1037–1046. doi: 10.1038/s41586-023-06443-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Roussel P., Godais G.L., Bocquelet F., Palma M., Hongjie J., Zhang S., Giraud A.-L., Mégevand P., Miller K., Gehrig J., et al. Observation and assessment of acoustic contamination of electrophysiological brain signals during speech production and sound perception. J. Neural. Eng. 2020;17 doi: 10.1088/1741-2552/abb25e. [DOI] [PubMed] [Google Scholar]
16.Bush A., Chrabaszcz A., Peterson V., Saravanan V., Dastolfo-Hromack C., Lipski W.J., Richardson R.M. Differentiation of speech-induced artifacts from physiological high gamma activity in intracranial recordings. Neuroimage. 2022;250 doi: 10.1016/j.neuroimage.2022.118962. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

•
The raw datasets supporting the current study have not been deposited in a public repository because it contains personally identifiable patient information, but are available in an anonymized form from the lead contact on reasonable request.
•
All original code to replicate the main findings of this study can be found at https://doi.org/10.5281/zenodo.13893640.
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

[bib1] 1.Zhang D., Wang Z., Qian Y., Zhao Z., Liu Y., Hao X., Li W., Lu S., Zhu H., Chen L., et al. A brain-to-text framework for decoding natural tonal sentences. Cell Rep. 2024;43 doi: 10.1016/j.celrep.2024.114924. [DOI] [PubMed] [Google Scholar]

[bib2] 2.Zhan W., Guo R., Chang B., Chen Y., Chen L. The building of the CCL corpus: its design and implementation. Yuliaoku Yuyanxue. 2019;1:71–86. [Google Scholar]

[bib3] 3.Desikan R.S., Ségonne F., Fischl B., Quinn B.T., Dickerson B.C., Blacker D., Buckner R.L., Dale A.M., Maguire R.P., Hyman B.T., et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage. 2006;31:968–980. doi: 10.1016/j.neuroimage.2006.01.021. [DOI] [PubMed] [Google Scholar]

[bib4] 4.Fischl B., Van Der Kouwe A., Destrieux C., Halgren E., Ségonne F., Salat D.H., Busa E., Seidman L.J., Goldstein J., Kennedy D., et al. Automatically parcellating the human cerebral cortex. Cereb. Cortex. 2004;14:11–22. doi: 10.1093/cercor/bhg087. [DOI] [PubMed] [Google Scholar]

[bib5] 5.Hameed N.U.F., Zhao Z., Zhang J., Bu L., Zhou Y., Jin L., Bai H., Li W., Tang J., Lu J., et al. A novel intraoperative brain mapping integrated task-presentation platform. Oper. Neurosurg. 2021;20:477–483. doi: 10.1093/ons/opaa476. [DOI] [PubMed] [Google Scholar]

[bib6] 6.Hamilton L.S., Chang D.L., Lee M.B., Chang E.F. Semi-automated anatomical labeling and inter-subject warping of high-density intracranial recording electrodes in electrocorticography. Front. Neuroinform. 2017;11:62. doi: 10.3389/fninf.2017.00062. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.Bouchard K.E., Mesgarani N., Johnson K., Chang E.F. Functional organization of human sensorimotor cortex for speech articulation. Nature. 2013;495:327–332. doi: 10.1038/nature11911. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Lawhern V.J., Solon A.J., Waytowich N.R., Gordon S.M., Hung C.P., Lance B.J. EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces. J. Neural. Eng. 2018;15 doi: 10.1088/1741-2552/aace8c. [DOI] [PubMed] [Google Scholar]

[bib9] 9.Bergstra J., Komer B., Eliasmith C., Yamins D., Cox D.D. Hyperopt: a python library for model selection and hyperparameter optimization. Comput. Sci. Discov. 2015;8 [Google Scholar]

[bib10] 10.Moses D.A., Metzger S.L., Liu J.R., Anumanchipalli G.K., Makin J.G., Sun P.F., Chartier J., Dougherty M.E., Liu P.M., Abrams G.M., et al. Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria. N. Engl. J. Med. 2021;385:217–227. doi: 10.1056/NEJMoa2027540. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Moses D.A., Metzger S.L., Liu J.R., Anumanchipalli G.K., Makin J.G., Sun P.F., Chartier J., Dougherty M.E., Liu P.M., Abrams G.M., et al. Neuroprosthesis for decoding speech in a paralyzed person with anarthria. N. Engl. J. Med. 2021;385:217–227. doi: 10.1056/NEJMoa2027540. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.Willett F.R., Kunz E.M., Fan C., Avansino D.T., Wilson G.H., Choi E.Y., Kamdar F., Glasser M.F., Hochberg L.R., Druckmann S., et al. A high-performance speech neuroprosthesis. Nature. 2023;620:1031–1036. doi: 10.1038/s41586-023-06377-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] 13.Card N.S., Wairagkar M., Iacobacci C., Hou X., Singer-Clark T., Willett F.R., Kunz E.M., Fan C., Vahdati Nia M., Deo D.R., et al. An Accurate and Rapidly Calibrating Speech Neuroprosthesis. N. Engl. J. Med. 2024;391:609–618. doi: 10.1056/NEJMoa2314132. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Metzger S.L., Littlejohn K.T., Silva A.B., Moses D.A., Seaton M.P., Wang R., Dougherty M.E., Liu J.R., Wu P., Berger M.A., et al. A high-performance neuroprosthesis for speech decoding and avatar control. Nature. 2023;620:1037–1046. doi: 10.1038/s41586-023-06443-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] 15.Roussel P., Godais G.L., Bocquelet F., Palma M., Hongjie J., Zhang S., Giraud A.-L., Mégevand P., Miller K., Gehrig J., et al. Observation and assessment of acoustic contamination of electrophysiological brain signals during speech production and sound perception. J. Neural. Eng. 2020;17 doi: 10.1088/1741-2552/abb25e. [DOI] [PubMed] [Google Scholar]

[bib16] 16.Bush A., Chrabaszcz A., Peterson V., Saravanan V., Dastolfo-Hromack C., Lipski W.J., Richardson R.M. Differentiation of speech-induced artifacts from physiological high gamma activity in intracranial recordings. Neuroimage. 2022;250 doi: 10.1016/j.neuroimage.2022.118962. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Protocol to perform offline ECoG brain-to-text decoding for natural tonal sentences

Daohan Zhang

Zhenjie Wang

Youkun Qian

Zehao Zhao

Yan Liu

Junfeng Lu

Yuanning Li

Summary

Graphical abstract

Highlights

Before you begin

Protocol overview

Materials and equipment setup

Preparation

Institutional permissions

Experimental material preparation

Participant selection, consent, and task familiarization

Key resources table

Materials and equipment

Step-by-step method details

Data acquisition and signal processing

Phonetic and phonological annotation

Electrode categorization

Cortical surface reconstruction and electrode visualization

Speech detector training and hyperparameter tuning

Tone and syllable decoder training and hyperparameter tuning

Language model training and integration

Expected outcomes

Quantification and statistical analysis

Limitations

Troubleshooting

Problem 1

Potential solution

Problem 2

Potential solution

Problem 3

Potential solution

Problem 4

Potential solution

Problem 5

Potential solution

Problem 6

Potential solution

Resource availability

Lead contact

Technical contact

Materials availability

Data and code availability

Acknowledgments

Author contributions

Declaration of interests

Contributor Information

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases