Figure 2: Pipeline components and workflow.
(A) The user interface allows researchers to upload their electrophysiology recordings to cloud storage, initiate data processing jobs, receive notification upon completion, and download results to local computers. (B) MQTT-based job scanner service monitors job status on the NRP, sends a message to the listener for the next job, and notifies users. (C) MQTT-based job listener service that subscribes to specific topics to run data processing jobs. When the service receives a message, it parses the JSON format to extract experiment identifiers and computing requirements, then deploys jobs to NRP through Python-Kubernetes API. Both scanner and listener services update their status to S3 log files on a scheduled basis. (D) S3 file structure for service logging and experiment data. Log files are human-readable text files that track service status. Experiment data is stored in batches, each with a unique identifier (UUID), metadata file, “original” bucket for experiment data, and “derived” bucket for analysis outputs. (E) Computing cluster (NRP) for running containerized jobs using Kubernetes. (F) An analysis container for batch processing is capable of loading electrophysiology recordings, running spike sorting and autocuration algorithms, producing visualization figures, and generating Numpy files for single units. (G) Kubernetes configuration for job deployment to a computing cluster.
