Abstract
The study of human sperm motility has been a topic of interest for decades due to its crucial role in fertility and reproductive health. While most analyses rely on 2D+t imaging of head trajectories, sperm naturally swim in three dimensions (3D), driven by complex flagellar motion. However, the lack of comprehensive 3D+t datasets has limited progress in this field. To address this, we present 3D-SpermFlagella, the first large-scale 3D+t dataset of human sperm flagellum centerline annotations. This dataset contains 135 tracked and annotated sperm, derived from our previously published multifocal video microscopy dataset 3D-SpermVid. Each flagellar centerline was annotated over time in three dimensions, incubated under non-capacitating (NCC) and capacitating (CC) conditions. The (x,y,z) coordinates are provided in both micrometers and voxels, making 3D-SpermFlagella a valuable resource for studying sperm motility in its full spatial complexity and for the development and benchmarking of AI-based models for tracking and segmentation. In this paper, we describe the segmentation and tracking methods, as well as the conditions and structure of the dataset.
Subject terms: Scientific data, Cell biology, Computational science, Applied mathematics
Background & Summary
Understanding the intricate dynamics of sperm motility is paramount in the fields of reproductive biology and fertility research1–15. Over the years, advancements in imaging technologies have revolutionized our ability to capture and analyze sperm behavior, leading to the creation of various databases aimed at cataloging and elucidating the complexities of sperm function3,5,6,11,12,16–21. Despite providing valuable insights, existing databases often present limitations that hinder their usability and accessibility.
In the landscape of existing databases of sperm morphology and motility, several repositories have been published, each offering unique insights into different facets of sperm physiology. While numerous datasets are available, they primarily focus on morphological characteristics through static 2D images, with only a few incorporating temporal information as a sequence of 2D frames. One of these examples is the HSMA-DS (Human Sperm Morphology Analysis DataSet), which provides annotated images showcasing various morphological abnormalities in human sperm, ranging from abnormal heads to tail irregularities16. The HuSHeM (Human Sperm Head Morphology) database18 includes 216 sperm heads categorized as normal (54), tapered (53), pyriform (57), and amorphous (52), while the SCIAN-MorphoSpermGS22 dataset (Gold standard for computer-assisted sperm morphology analysis) contains 1,854 sperm heads classified as normal, tapered, pyriform, small, and amorphous. Similarly, the MHSMA (Modified Human Sperm Morphology Analysis) dataset offers a collection of images for automatically assessing human sperm, categorized based on acrosome, head, vacuole, and tail abnormalities17. Meanwhile, the SMIDS (Sperm Morphology Image Data Set) is a promising resource for automated sperm detection and classification, with 1021 sperm labeled as normal, 1005 abnormal, and 974 non-sperm19. However, despite their wealth of information, these datasets predominantly focus on 2D static representations of sperm morphology, lacking comprehensive temporal information on sperm flagellar movement.
Assessing sperm motility introduces additional complexities compared to evaluating sperm morphology due to the requirement of temporal information (2D+t). The integration of the Z-axis in recent sperm motility analysis has significantly enhanced the study of sperm swimming behavior, enabling three-dimensional (3D+t) tracking21,23. Traditionally, Computer-Assisted Semen Analysis (CASA) systems, widely used for sperm motility assessment, track sperm head trajectories in 2D10. These trajectories are used to compute motility measurements such as VCL (Curvilinear Velocity), VSL (Straight Line Velocity), and VAP (Average Path Velocity). However, recent approaches have evolved to directly analyze sperm motility using both 2D+t and 3D+t methods, with a particular focus on flagellar movement. A summary of available sperm tracking datasets is presented in Table 1, categorizing datasets based on head trajectory tracking or flagellar movement in 2D+t and 3D+t imaging modalities.
Table 1.
Summary of sperm tracking datasets.
| Dataset | Description | Imaging | Summary | Available at |
|---|---|---|---|---|
| Saggiorato et al.3 | Flagellum centerline 2D+t | Olympus IX91 microscope with a dark-field condenser |
Sperm attached to Glass Non-Hyperactive cells (6): 8,002 traces Hyperactivate cells (6): 5,757 traces Progesterone-treated cells (26): 343,077 Others (6 cells): 67,478 |
10.5281/zenodo.884626 |
| Walker et al.11 | Flagellum centerline 2D+t | Phase-contrast microscope | Bovine Spermatozoa (216 cells): 21,600 traces | 10.5287/bodleian:dpw9XANJa |
| Guasto et al.12 | Flagellum centerline 2D+t |
Nikon Ti-E Microscope with phase-contrast |
78 Ciona: 96,266 traces 11 Human High: 5,706 traces 99 L_variegatus: 177,572 traces 48 Abalon: 35,232 traces 1 Bull: 50 traces 29 Zebrafish: 31,709 traces 133 L_pictu:, 400,840 traces 36 Human low: 9,551 traces 4 Arbacia: 10,408 traces 63 Spuspuratus: 268,412 traces |
10.7910/DVN/CPAPV1 |
| Dardikman-Yoffe et al.21 | Sperm head and Flagellum centerline 3D+t |
Olympus UPLSAPO Off-axis holographic phase microscopy 1/2000 seconds time interval. |
No information on the number of cells. | Upon request |
| Gong et al.6 | Flagellum centerline 3D+t |
Olympus IX71 microscope Digital inline holographic microscopy 1/1000 seconds time interval. |
Human sperm (14 cells): 34,386 traces. Sea urchin sperm (5 cells): 10,655 traces. |
10.5281/zenodo.4709820 |
| Powar et al.5 | Flagellum centerline 3D+t | Olympus AX-70 with dark-field condenser | No information on the number of cells. | Upon Request |
| Thambawita et al.20 | Head Tracking for Human Sperm (2D+t) | Olympus CX31 microscope with phase-contrast optics |
20 video recordings. 656,334 Bounding boxes. Labels per spermatozoa: “Normal”, “Pinhead” and “Cluster” |
10.5281/zenodo.7293726 |
| 3D-SpermFlagella (Ours) | Flagellum centerline 3D+t |
Olympus IX71 Microscopy Piezoelectric device allows moving the focal plane |
Human Sperm – Free Swimming 49 NCC, 9,732 traces 86 CC, 14,308 traces |
10.5281/zenodo.15299846 |
VISEM-Tracking, presented by Thambawita et al.20, offers a comprehensive dataset comprising video clips of human semen samples. It includes information on 2D bounding boxes of sperm heads, along with labels for different types of spermatozoa. This dataset consists of 656,334 annotated objects and primarily focuses on head tracking in 2D+t, lacking information about the flagellum. Saggiorato et al.3 utilized a dark-field microscope to capture 2D+t images of tethered human sperm samples using a CMOS camera at 500 frames per second (FPS). This setup enabled the automatic tracking of the flagellum centerline from 44 cells, resulting in a total of 424,314 traces. However, this dataset only includes tethered sperm samples. Walker et al.11 employed a phase-contrast microscope to acquire 2D+t images of bovine spermatozoa, comprising a dataset of 216 sperm with 21,600 traces. Although substantial, this dataset does not include human sperm samples. Guasto et al.12 released a large 2D+t flagellum dataset encompassing multiple species, captured using a phase-contrast microscope at varying frame rates ranging from 250 FPS to 750 FPS. This dataset comprises 502 sperm samples, totalling 1,035,746 traces, with 47 sperm corresponding to humans, resulting in 15,257 traces. The main focus of the previously released datasets is on 2D+t data, overlooking the 3D+t swimming behavior of sperm.
In the context of 3D+t flagellum datasets, visualization methods require complex techniques to extract 3D information from 2D images. Holography employs the Rayleigh-Sommerfeld back propagator, while the dark-field approach estimates the out-of-plane displacement of the flagellum (Z-axis) using a thin-lens approximation. Dardikman-Yoffe et al.21 utilized off-axis holographic microscopy to acquire 2,000 FPS of human sperm data. However, the information about the dataset is unspecified, and the dataset is not immediately available as it must be requested. Gong et al.6 employed digital inline holographic microscopy at 1,000 FPS to visualize 19 sperm, resulting in 45,041 traces, with 14 sperm corresponding to humans, yielding 34,386 traces. Powar et al.5 used dark-field microscopy to acquire 2D+t images, but information on the number of cells or traces released is unavailable, and the dataset is not immediately available, as it must be requested.
Our dataset, 3D-SpermFlagella2, offers a comprehensive, openly accessible resource to researchers worldwide. While existing datasets may include 3D+t flagellum traces, they suffer from limited sample sizes. In contrast, our dataset provides 135 human sperm centerlines reconstructed from multifocal optical microscopy with a piezo-driven focal plane1,23. This approach enables volumetric imaging, yielding reliable 3D flagellar centerlines that can serve as a robust foundation for motility analysis and reproductive health, and the benchmarking of segmentation and tracking algorithms8,24–27.
The release of 3D-SpermFlagella2 marks a significant milestone in this endeavour, offering a rich dataset of 135 free-swimming sperm with annotated 3D+t flagellum centerline positions. Capturing flagellar dynamics under both non-capacitating and capacitating conditions, it provides a unique resource for elucidating human sperm motility and for advancing AI-based analysis, while preserving natural spatial behaviour without the need for staining agents.
Methods
Data acquisition / imaging
The 3D-SpermFlagella dataset2, consisting of 3D+t human sperm flagellum centerlines, is derived from the images recently published in the article describing the 3D+t Multifocal Imaging Dataset of Human Sperm (3D-SpermVid)1.
The 3D+t images were acquired using a custom videomicroscopy system based on an inverted Olympus IX71 microscope mounted on a vibration-isolated optical table (TMC, GMP SA, Switzerland). The imaging setup included a 60X water immersion objective (NA = 1.00, Olympus UIS2 LUMPLFLN 60X W) coupled to a piezoelectric device (P-725, Physik Instruments, MA, USA), enabling axial oscillations. These oscillations, controlled by a servo-controller (E501) and a high-current amplifier (E-55, Physik Instruments, MA, USA), operated at 90 Hz with an oscillation amplitude of 20 μm.
Image acquisition was performed using a NAC Q1v high-speed camera with 8 GB of internal memory, capturing frames at 5,000 or 8,000 frames per second (fps)s. The released TIF hyperstacks remain at the native resolution, with no resizing or binning applied. Some hyperstacks have a reduced vertical size due to the removal of information labels that could not be disabled during the acquisition process. As a result, some hyperstacks (CC) retain the full 640 × 480 pixel size, while others (NCC) have a resolution of 640 × 448 pixels. Under these conditions, the system recorded 27,000 images over 5.5 or 3.5 seconds, depending on the selected frame rate. Synchronization of the acquisition system was managed via an E-506 function generator (NI USB-6211, National Instruments, USA). The sample temperature was maintained at 37 °C to ensure stable experimental conditions using a thermal controller (Warner Instruments, TCM/CL100).
A full description of the image dataset is provided by Montoya et al.1.
Data annotation
The entire dataset was annotated using a custom-made code (available at https://github.com/paul-hernandez-herrera/LIVC_UNAM/tree/main/matlab_code/Sperm_tracing_3D_Release). This solution was designed for visualization, annotation, semi-manual tracing, data curation, and data post-processing. Figure 2 summarizes the annotation workflow, which is briefly explained in the subsequent subsections, while Fig. 3 shows its visual representation (a higher-quality version is provided in Supplementary Fig. S1).
Fig. 2.

Dataset Annotation Workflow. The annotation process begins by selecting a 3D+t image stack at a specific time point. A Graphical User Interface (GUI) is used to set the flagellum’s tip position. This position, along with the 3D stack, guides a wavefront propagation along the flagellum’s centerline, which stops once it reaches the tip. A visual inspection is performed to verify the accuracy of the extracted centerline. If errors are detected, a GUI is used to make corrections. Once the centerline is confirmed to be accurate, the flagellum’s coordinates are saved in a file.
Fig. 3.
Overview of the annotation methodology for flagellum tracing. (a) A GUI enables manual annotation of the flagellum tip point, (b) the annotated tip point and a centerline tracing algorithm are employed to generate the flagellum centerline, (c) all traces are inspected using projections along the X, Y, and Z axes to identify inaccuracies or errors in tracing, (d) to address any identified errors, a GUI is enables user-guided refinement of the trace, and (e) the traced flagellum represented in voxel coordinates, is transformed into real-space coordinates (micrometers) (f) for further analysis and interpretation.
Flagellum tip annotation
The tip of the sperm flagellum is the farthest and thinnest point, posing challenges for automatic identification, detection, and localization in the image. To address this, we developed a custom Matlab28 Graphic User Interface (GUI) that displays maximum intensity projections in XY, XZ, and YZ with subtracted background from 3D stacks. Using this GUI, we manually set the 2D position of the flagellum tip in the XY plane with a mouse click. Additionally, views in the XZ and YZ planes are available to help identify the correct 3D position. Each flagellum’s tip position served as a reference point for guiding the semi-manual tracing process, as detailed in subsequent sections. In a few cases where the flagellum’s tail aligns with the Z-axis, a bright structure is formed in the Z-axis, and the tracing algorithm fails to detect the 3D position of the flagellum’s tail. In such instances, we annotated the flagellum’s tip position in the YZ or XZ plane to accurately determine its 3D position. Figure 3a illustrates the GUI used for annotating the flagellum tip, showing sample points that were manually set. To ensure smooth tracking, the GUI displays the previous 10 annotated points, helping to prevent abrupt changes in the flagellum tip’s trajectory.
Automatic flagellum’s centerline annotation
The centerline tracing algorithm determines the path of minimal cost between two key points: the initial point at the sperm’s head and the terminal point at the flagellar tip. The initial point is automatically detected based on prior knowledge that the sperm head is the brightest and largest structure of the 3D stack. The centroid of the detected sperm head serves as the designated initial point for the tracing process. The terminal point corresponds to the flagellum tip, which is manually annotated as explained in the previous section. If the terminal point is annotated in 3D coordinates, it is directly utilized as the unique terminal point. However, if set in 2D coordinates, we convert it to 3D coordinates by generating several points along the Z-axis (equivalent to the number of planes in the Z-axis). Subsequently, we connect these two points (initial and terminal point) using a minimal-path algorithm, similar to the methods described in29–31. This algorithm guides the path through the center of the flagellum by using a cost function based on the intensity values of the 3D image stack. In the 2D case where multiple points are generated along the Z-axis, we evaluate all possible paths and select the one with the minimum cost as the unique path. This approach automatically connects the sperm head and the flagellar tip using a sequence of neighboring points along the center of the flagellum. Figure 3b illustrates projections of the 3D image stack with an overlay of the automatic annotation, while Fig. 3e illustrates a 3D volume rendering of the 3D stack with an overlay of the automatic annotation (tracing) in blue.
Data curation
While the semi-manual tracing algorithm works correctly for most stacks, some factors, such as high flagellum bending, low flagellum intensity, and artifacts caused by light diffraction, may lead to inaccuracies in flagellum tracing. To address this issue, we employ a custom-made GUI that allows interactive visualization of XY, XZ, and YZ maximum intensity projections of the 3D stack (see Fig. 3a). This GUI also overlays the semi-manual trace, thereby facilitating the visual validation of each trace’s accuracy. Additionally, it enables the modification of the initial point (sperm head) and terminal point (flagellum tip), the exclusion of specific regions during tracing, and the re-execution of the tracing algorithm (see Fig. 3d). In cases where semi-manual tracing fails, the GUI features are used to correct errors and trace a precise centerline. Figure 3c shows an example of an incorrectly traced flagellum, detected using the projections of the trace. In Fig. 3d, we display the GUI utilized for correcting such traces with manually curated data.
Finally, we removed the centerline points corresponding to the centroid of the head-to-neck region because the z-position of the head cannot be reliably determined. This limitation arises from the nonuniform spacing along the z-axis and diffraction artifacts, which hinder precise 3D localization. To address this, we implemented a GUI tool for manual annotation of the neck position in the XY plane; this annotated point is then used to remove the head region.
Conversion of traces from voxels to real coordinates (micrometers)
To obtain the traces, we utilize the coordinate system based on the voxel positions within the 3D image stack. The x and y voxel positions of the trace are converted to real coordinates in micrometers using the pixel size of the image sensor (118/640 = 0.1844 µm). Additionally, we employed the z-axis values in micrometers of each slice provided with the image dataset to convert the z voxel position of the trace to micrometers. This process ensures that the trace accurately represents the spatial coordinates of the sperm cell in the real-world physical space, facilitating precise analysis and interpretation of the data. Finally, the centerline is smoothed using fifth-degree smoothing splines and interpolated to maintain a constant spacing of 0.1 µm between consecutive flagellum centerline points. Figure 3e depicts an example of the centerline in voxel coordinates while Fig. 3f) depicts the centerline in real-space coordinates (micrometers).
Figure 4 depicts examples of 8 distinct experiments reconstructed with semi-manual annotations acquired with the methodology described above. The trajectory of the sperm can follow any direction over time (from blue to red) due to the random selection of sperm within the microscopy field of view. Additionally, the reconstruction time (in seconds) may vary depending on the duration that sperm remains in the field of view during the acquisition. For a detailed visualization of the 3D+t sperm swimming behaviour, see Videos 5–12 (10.5281/zenodo.18394232)32.
Fig. 4.
Digital Reconstruction of Flagellum Centerline. Eight representative examples of traced flagellar centerlines are shown. Each color corresponds to a different time point, with blue indicating the initial time and red the final time. The sphere denotes the position of the neck. The left column displays examples under non-capacitating conditions, while the right column shows examples under capacitating conditions.
Data Preprocessing
3D+t annotation of NCC sperm
The 3D spacing between slices along the Z-axis in the 3D stack is not constant. While this variation does not affect the overall 3D reconstruction of the sperm, it may pose challenges for automatic tracing, particularly due to diffraction intensity issues. To address this, for the NCC dataset, we pre-processed the 3D stack to reduce the need for data curation by employing interpolation techniques to achieve uniform z-spacing. Specifically, we set a fixed spacing of 0.5 µm between slices along the Z-axis. The extracted 3D coordinates from the uniform stack are then mapped back to the original stack. It is important to note that the methodology for semi-automatic tracing was developed and refined over several years, with this modification introduced recently.
Data Records
The 3D-SpermFlagella dataset is available at Zenodo2. This dataset comprises 135 three-dimensional reconstructed samples of sperm beating over time. Specifically, 49 annotations correspond to NCC sperm, while 86 annotations correspond to CC sperm.
Dataset folder structure
The 3D-SpermFlagella2 dataset contains two top-level folders:
traces_micrometers
-
traces_voxels
Each contains 12 subfolders, corresponding to experiments conducted on specific dates. These subfolders match repositories in the 3D-SpermVid¹ imaging dataset and are titled:
-
3D+t freely swimming human sperm incubated in [condition] [YYYY-MM-DD]
Examples:
3D+t freely swimming human sperm incubated in Non-Capacitating Conditions (NCC) 2021-07-30
-
3D+t freely swimming human sperm incubated in Capacitating Conditions (CC) 2019-03-26
Within each date folder are subfolders for individual experiments, titled:
-
Sperm-[Cell_number]-[Condition]-[Date]_Exp[Experiment_number]
Example:
-
Sperm-2-NoCap_210730_Exp5
If two or more cells were recorded in the same video, a suffix distinguishes them:
Sperm-5-Cap_190326_Exp8_cell-1
Sperm-5-Cap_190326_Exp8_cell-2
Each cell folder contains four files:
Initial_tp.csv: Contains a single integer indicating the time point in the 3D+t image that corresponds to the first trace in the X.csv, Y.csv, and Z.csv files.
X.csv: an N × T matrix of X-coordinates, where rows correspond to points along the head–flagellum centerline, and columns correspond to time points.
Y.csv: same format as X.csv, containing Y-coordinates.
Z.csv: same format as X.csv, containing Z-coordinates.
Here, N is the number of points per trace, and T is the number of reconstructed stacks. Row 1 corresponds to the sperm neck, while the last point in each column corresponds to the flagellum tip. Note that the number of annotated points may be less than N due to several factors: (i) The orientation of the trace affects the number of annotated points. For example, a centerline trace at 45° (diagonal) in XY with 10 points corresponds to approximately a length of 14 voxels, whereas at 0° (parallel to X-axis) would require 14 points to have a length of 14 voxels. (ii) Imaging limitations: In some stacks, the flagellum may be captured with varying lengths, and (iii) voxel size: The spacing in the z-plane is not uniform, which also affects the number of annotated points. For stacks with fewer than N annotated points, the remaining entries are filled with nan values. Figure 5 depicts an example of the dataset format, where the red rectangle corresponds to the X coordinates of the first traced stack, while the blue rectangle contains the X coordinates of the fourth annotated stack. The Y.csv and Z.csv files complete the 3D coordinates for the annotation.
Fig. 5.

Example of the dataset format, which consists of four files. Initial_tp.csv contains the initial time point annotated. X.csv, Y.csv, and Z.csv store the coordinates in voxels for each time point, where rows correspond to annotations and columns represent time points. The red and blue rectangles show the X coordinates for the first and fourth annotated time points.
Annotation dataset description
We annotated 135 sperm cells in 3D+t, yielding 24,040 semi-automatic flagellum centerline annotations from 3D image stacks. The dataset comprises two biological conditions, as detailed by Montoya et al.1 and Hernández et al.25: (1) control sperm incubated in non-capacitating conditions (NCC), and (2) sperm exposed to capacitation conditions (CC). The “control” subset, i.e. the NCC, consists of annotations from 49 sperm cells, totaling 9,732 three-dimensional stack annotations. In contrast, the CC subset includes annotations from 86 sperm cells, totaling 14,308 three-dimensional stacks’ annotations. Notably, the 3D-SpermVid dataset provides a larger number of hyperstacks for CC than for NCC, and our annotations reflect this distribution. This imbalance is biologically motivated, as only 10–20% of sperm under CC exhibit hyperactivation, and sperm in CC display a broader range of behavioral dynamics1.
The acquisition of 3D+t images was performed at a frequency of 90 Hz, resulting in a time interval of 1/90 seconds between consecutive traces. Table 2 summarizes the dataset statistics for both the control (NCC) and capacitating (CC) subsets, as well as the full dataset (referred to as 3D-SpermFlagella). On average, each sperm cell in the dataset has been annotated for 178 stacks (see Table 2), roughly corresponding to 2 seconds of swimming. Figure 6 shows a histogram illustrating the distribution of tracking duration in seconds per sperm cell.
Table 2.
Summary of the total number of reconstructed traces per sperm.
| Dataset | Sperm Count | Total | Mean | Std | Min | 25% | 50% | 75% | Max |
|---|---|---|---|---|---|---|---|---|---|
| NCC | 49 | 9,732 | 199 | 95 | 70 | 136 | 170 | 226 | 499 |
| CC | 86 | 14,308 | 166 | 66 | 65 | 119 | 154 | 200 | 440 |
| Full | 135 | 24,040 | 178 | 79 | 65 | 125 | 157 | 212 | 499 |
The tracking duration in seconds is obtained by dividing by 90.
Fig. 6.
Summary of tracking duration in seconds (“Control/NCC”, “CC”, and 3D-SpermFlagella). The X-axis corresponds to the tracking duration, while the y-axis indicates the sperm count.
Data overview
The 3D-SpermFlagella dataset2 contains the 3D coordinates of sperm flagellum centerlines over time (t), derived from the spatiotemporal (3D+t) image dataset 3D-SpermVid1. It comprises 135 human sperm cells, each tracked for 1 to 3.5 seconds. The number of time points varies depending on the specific acquisition. Cells were incubated under two experimental conditions: 49 under non-capacitating conditions (NCC) and 86 under capacitating conditions (CC). For each time point (i.e., each volumetric image), the flagellum centerline is represented as a sequence of 3D coordinates, resulting in a spatiotemporal reconstruction of flagellar motion. All samples were imaged in aqueous medium; a detailed description of the imaging protocol is available in Montoya et al.1. Figure 1 provides an overview of the dataset. Figure 1a shows the overlay of the extracted centerlines (red) at different heights within a single volumetric image, i.e., a 3D image stack. Figure 1b illustrates an example of the 3D centerline positions at a specific time point, representing the main contribution of the dataset in voxels. Figure 1c shows 4 different cells and their corresponding temporal evolution (multiple time points). The coordinates along the axes are expressed in voxels, with anisotropic voxel spacing due to the image acquisition protocol (higher resolution in the xy plane than along the z-axis). The color scale indicates time in seconds. Videos 1–4 (10.5281/zenodo.18394232)32 display flagellar centerlines reconstructed from image stacks, with the sperm neck—defined as the head–midpiece junction—indicated by a sphere in real-space coordinates (micrometers).
Fig. 1.
Overview of 3D-SpermFlagella. The dataset consists of 3D+t coordinates representing the flagellum’s centerline over time, obtained from a sequence of 3D image stacks acquired at different time points. Each 3D stack comprises a series of images captured at different focal planes. (a) An overlay of images at different Z heights (focal plane) with the corresponding centerline. (b) The dataset for a specific time/stack consists of the 3D coordinates of the flagellum’s centerline, represented as a sequence of points starting at the sperm neck and ending at the flagellum’s tip. (c) Examples of sperm swimming behavior over time in voxel coordinates, illustrating the trajectories captured in the dataset (49 NCC and 86 CC).
Technical Validation
The annotated dataset was carefully examined visually. This involved analyzing projections along the X, Y, and Z axes of both the trace (a visual representation of the flagellum’s centerline) and the 3D image stack. This step ensured that the intensity and flagellum were accurately aligned. When errors were found in the trace, they were corrected using the graphical user interface (GUI) described above. This validation and correction process was repeated for each annotated flagellum trace, ensuring accuracy throughout.
Furthermore, we utilized the ParaView application to create 3D visualizations. This allowed us to overlay the 3D image stack onto the tracing, ensuring precise alignment in all three dimensions. By comparing the flagellum’s intensity profile with the trace, we confirmed proper overlap and tracing accuracy. This thorough approach ensured the reliability of our data analysis. Figure 3e depicts an illustration of this alignment process.
Usage Notes
This dataset represents a significant advancement in our understanding of human sperm motility, being the largest collection of 3D annotations of flagellum centerline, with a total of 135 human sperm cells annotated (approximately 2 seconds of swimming per sperm and temporal resolution of 1/90 seconds). Notably, it stands as the pioneering dataset featuring sperm exposed to capacitating media, a crucial factor for inducing hyperactivation, a motility pattern observed in sperm during fertilization.
Researchers can leverage this rich resource to delve into the intricate beat patterns of human sperm flagella, both in 2D and 3D dimensions, and validate existing mathematical models governing sperm movement3–7,33. Several studies have already been based on this dataset, analysing flagellar dynamics in three-dimensional space, considering features such as the tortuosity, fractal dimension, and enveloping ellipses8,9,24–27. Moreover, it has also served for developing novel methodologies to characterize hyperactivation based on flagellar motility in 3D, underpinning advancements in our understanding of this pivotal sperm motility trait25,26. Computer-Assisted Sperm Analysis (CASA) systems, commonly employed for sperm motility assessment, traditionally track sperm head trajectories in 2D10. Our dataset offers a unique opportunity to investigate potential disparities between classical measures such as VSL, VCL, VAP, LIN, WOB, among others, in both 2D and 3D contexts. This exploration could pave the way for groundbreaking research in CASA systems, shedding light on how these measures vary across dimensions and inspiring new avenues for improvement in sperm motility analysis.
Beyond its immediate applications in reproductive biology, this dataset can increase its applicability with the 3D-SpermVid1 dataset to feed the training and assessment of machine learning algorithms for sperm neck and flagellum segmentation. This provides new possibilities for the computer science community to refine algorithms for tracking sperm and flagellum centerlines in 2D and 3D brightfield images, supported by a robust ground-truth dataset with a total of 24,040 annotations of sperm flagellum centerlines.
Supplementary information
Acknowledgements
This project has been made possible in part by a grant number 2023-329644 from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation. SECIHTI scholarship was granted to ABS. This work was supported in part by the Universidad Nacional Autónoma de México (UNAM) - Programa de Apoyo a Proyectos de Investigación e Innovación Tecnológica (PAPIIT) projects IT101624, IN108624 and IN227925.
Author contributions
Conceptualization: P.H.-H., H.O.H., F.M., A.B.-S., D.S.D.-G., A.D., G.C.; Methodology: P.H.-H., H.O.H., F.M., A.B.-S., D.S.D.-G., A.D., G.C.; Software: P.H.-H., H.O.H., F.M., A.B.-S.; Validation: P.H.-H., H.O.H., F.M., A.B.-S., D.S.D.-G., G.C.; Resources: P.H.-H., A.D., G.C.; Writing - original draft: P.H.-H., H.O.H., A.B.-S.; Writing - review & editing: P.H.-H., H.O.H., A.B.-S., F.M., D.S.D.-G., A.D., G.C.; Project administration: P.H.-H., A.D., G.C.; Funding acquisition: P.H.-H., A.D., G.C.
Data availability
The 3D-SpermFlagella2 dataset is openly accessible in the Zenodo repository at the following link: 10.5281/zenodo.15299846.
Code availability
The code utilized for the semi-automated workflow, responsible for tracing the sperm head and flagellum centerline, is openly accessible on https://github.com/paul-hernandez-herrera/LIVC_UNAM/tree/main/matlab_code/Sperm_tracing_3D_Release.
Competing interests
The authors confirm that they have no financial interests or personal relationships that could have influenced the work presented in this paper.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Paul Hernández-Herrera, Email: paul.hernandez@uaslp.mx.
Gabriel Corkidi, Email: gabriel.corkidi@ibt.unam.mx.
Supplementary information
The online version contains supplementary material available at 10.1038/s41597-026-06876-2.
References
- 1.Montoya, F. et al. 3D+t Multifocal Imaging Dataset of Human Sperm. Sci Data12, 814, 10.1038/s41597-025-05177-4 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hernandez-Herrera, P. et al. 3D-SpermFlagella: 3D+t human sperm flagellum centerline dataset [Data set]. Zenodo10.5281/zenodo.15299846 (2025).
- 3.Saggiorato, G. et al. Human sperm steer with second harmonics of the flagellar beat. Nature communications8(1), 1415 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Nassir, M., Levi, M. & Shaked, N. T. Dynamic 3D Modeling for Human Sperm Motility through the Female Cervical Canal and Uterine Cavity to Predict Sperm Chance of Reaching the Oocyte. Cells12(1), 203 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Powar, S. et al. Unraveling the kinematics of sperm motion by reconstructing the flagellar wave motion in 3d. Small Methods6(3), 2101089 (2022). [DOI] [PubMed] [Google Scholar]
- 6.Gong, A. et al. Reconstruction of the three-dimensional beat pattern underlying swimming behaviors of sperm. The European Physical Journal E44(7), 87 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Gallagher, M. T., Kirkman-Brown, J. C. & Smith, D. J. Axonemal regulation by curvature explains sperm flagellar waveform modulation. PNAS nexus2(3), pgad072 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Díaz-Guerrero, D. S. et al. Computation of Human-Sperm Local Flagellar Instantaneous Velocity. In Congreso Nacional de Ingeniería Biomédica (pp. 59-66). Cham: Springer Nature Switzerland (2023, October).
- 9.Bribiesca-Sánchez, A. et al. Artifacts generated by the 3D rotation of a freely-swimming human sperm in the measurement of intracellular Ca2+. In Congreso Nacional de Ingeniería Biomédica (pp. 355-362). Cham: Springer International Publishing (2022, October).
- 10.Alquézar-Baeta, C. et al. OpenCASA: A new open-source and scalable tool for sperm quality analysis. PLoS computational biology15(1), e1006691 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Walker, B. J., Phuyal, S., Ishimoto, K., Tung, C. K. & Gaffney, E. A. Computer-assisted beat-pattern analysis and the flagellar waveforms of bovine spermatozoa. Royal Society open science7(6), 200769 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Guasto, J. S. et al. Flagellar kinematics reveals the role of environment in shaping sperm motility. Journal of the Royal Society Interface17(170), 20200525 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Tufoni, C. et al. Flagellar beating forces of human spermatozoa with different motility behaviors. Reproductive Biology and Endocrinology22(1), 28 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Yazdan Parast, F. et al. Viscous loading regulates the flagellar energetics of human and bull sperm. Small Methods8(7), 2300928 (2024). [DOI] [PubMed] [Google Scholar]
- 15.Hackerova, L. et al. Boar Sperm Motility Assessment Using Computer-Assisted Sperm Analysis: Current Practices, Limitations, and Methodological Challenges. Animals15(3), 305 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ghasemian, F., Mirroshandel, S. A., Monji-Azad, S., Azarnia, M. & Zahiri, Z. An efficient method for automatic morphological abnormality detection from human sperm images. Computer methods and programs in biomedicine122(3), 409–420 (2015). [DOI] [PubMed] [Google Scholar]
- 17.Javadi, S. & Mirroshandel, S. A. A novel deep learning method for automatic assessment of human sperm images. Computers in biology and medicine109, 182–194 (2019). [DOI] [PubMed] [Google Scholar]
- 18.Shaker, F., Monadjemi, S. A., Alirezaie, J. & Naghsh-Nilchi, A. R. A dictionary learning approach for human sperm heads classification. Computers in biology and medicine91, 181–190 (2017). [DOI] [PubMed] [Google Scholar]
- 19.Ilhan, H. O., Sigirci, I. O., Serbes, G. & Aydin, N. A fully automated hybrid human sperm detection and classification system based on mobile-net and the performance comparison with conventional methods. Medical & biological engineering & computing58, 1047–1068 (2020). [DOI] [PubMed] [Google Scholar]
- 20.Thambawita, V. et al. VISEM-Tracking, a human spermatozoa tracking dataset. Scientific Data10(1), 1–8 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Dardikman-Yoffe, G., Mirsky, S. K., Barnea, I. & Shaked, N. T. High-resolution 4-D acquisition of freely swimming human sperm cells without staining. Science advances6(15), eaay7619 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Chang, V., Garcia, A., Hitschfeld, N. & Härtel, S. Gold-standard for computer-assisted morphological sperm analysis. Computers in biology and medicine83, 143–150 (2017). [DOI] [PubMed] [Google Scholar]
- 23.Corkidi, G., Taboada, B., Wood, C. D., Guerrero, A. & Darszon, A. Tracking sperm in three dimensions. Biochemical and biophysical research communications373(1), 125–129 (2008). [DOI] [PubMed] [Google Scholar]
- 24.Bribiesca-Sánchez, A. et al. A Three-Dimensional Extension of the Slope Chain Code: Analyzing the Tortuosity of the Flagellar Beat of Human Sperm (2023).
- 25.Hernández, H. O. et al. Feature-based 3D+ t descriptors of hyperactivated human sperm beat patterns. Heliyon, 10(5) (2024). [DOI] [PMC free article] [PubMed]
- 26.Hernández, H. O. et al. 3D+ t feature-based descriptor for unsupervised flagellar human sperm beat classification. In 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) (pp. 488-492). IEEE (2022, July). [DOI] [PubMed]
- 27.Guzmán, A., Darszon, A., Corkidi, G. & Bribiesca, E. A Measure of Tortuosity for 3D Curves: Identifying 3D Beating Patterns of Sperm. In Pattern Recognition and Image Analysis: 11th Iberian Conference, IbPRIA 2023, Alicante, Spain, June 27–30, 2023, Proceedings (Vol. 14062, p. 363). Springer Nature (2023, June).
- 28.The MathWorks Inc. MATLAB version: 24.1.0 (R2024a), Natick, Massachusetts: The MathWorks Inc. https://www.mathworks.com (2024).
- 29.Hernandez-Herrera, P., Montoya, F., Rendón-Mancha, J. M., Darszon, A. & Corkidi, G. 3-D +T Human Sperm Flagellum Tracing in Low SNR Fluorescence Images. IEEE transactions on medical imaging37(10), 2236–2247 (2018). [DOI] [PubMed] [Google Scholar]
- 30.Santamaría-Pang, A., Hernandez-Herrera, P., Papadakis, M., Saggau, P. & Kakadiaris, I. A. Automatic morphological reconstruction of neurons from multiphoton and confocal microscopy images using 3D tubular models. Neuroinformatics13, 297–320 (2015). [DOI] [PubMed] [Google Scholar]
- 31.Arshadi, C., Günther, U., Eddison, M., Harrington, K. I. & Ferreira, T. A. SNT: a unifying toolbox for quantification of neuronal anatomy. Nature methods18(4), 374–377 (2021). [DOI] [PubMed] [Google Scholar]
- 32.Hernández-Herrera, P. et al. Supplementary videos for 3D-SpermFlagella dataset. Zenodo10.5281/zenodo.18434653 (2026).
- 33.van der Horst, G. Computer Aided Sperm Analysis (CASA) in domestic animals: Current status, three D tracking and flagellar analysis. Animal reproduction science220, 106350 (2020). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Hernandez-Herrera, P. et al. 3D-SpermFlagella: 3D+t human sperm flagellum centerline dataset [Data set]. Zenodo10.5281/zenodo.15299846 (2025).
- Hernández-Herrera, P. et al. Supplementary videos for 3D-SpermFlagella dataset. Zenodo10.5281/zenodo.18434653 (2026).
Supplementary Materials
Data Availability Statement
The 3D-SpermFlagella2 dataset is openly accessible in the Zenodo repository at the following link: 10.5281/zenodo.15299846.
The code utilized for the semi-automated workflow, responsible for tracing the sperm head and flagellum centerline, is openly accessible on https://github.com/paul-hernandez-herrera/LIVC_UNAM/tree/main/matlab_code/Sperm_tracing_3D_Release.




