Abstract
Accurate identification of incomplete blinking from eye videography is critical for the early detection of eye disorders or diseases (e.g., dry eye). In this study, we develop a texture-aware neural network based on the classical U-Net (termed TAU-Net) to accurately extract palpebral fissures from each frame of eye videography for assessing incomplete blinking. We introduced three different convolutional blocks based on element-wise subtraction operations to highlight subtle textures associated with target objects and integrated these blocks with the U-Net to improve the segmentation of palpebral fissures. Quantitative experiments on 1396 frame images showed that the developed network achieved an average Dice index of 0.9587 and a Hausdorff distance (HD) of 4.9462 pixels when applied to segment palpebral fissures. It outperformed the U-Net and its several variants, demonstrating a promising performance in identifying incomplete blinking based on eye videography.
Keywords: Incomplete blinking, Eye videography, Image segmentation, Convolutional blocks
1. Introduction
As a rapid eyelid closing and opening movement, eye blinking [1,2] is critical for ocular surface and tear film homeostasis. It can stimulate the secretion of meibomian lipids, facilitate the spreading and mixing of tear film constituents, and provide adequate lubrication for the eyes [3,4]. Eye blinking is generally categorized into three types: spontaneous, voluntary, and reflex [5]. Spontaneous blinking [6] can be further subdivided into either complete blinking (when the upper and lower lids contact with each other) or incomplete ones (when the upper lid does not fully contact with the lower lid) [7]. The incomplete blinking is often regarded as a risk factor for many eye diseases (e.g., dry eye) [8,9] because of its involvement with the development of meibomian gland dysfunction [10], which is one of the most common ophthalmic conditions in clinical practice [11]. Hence, it is desirable to identify incomplete blinking for facilitating the detection and diagnosis of these eye diseases [12,13]. However, it is difficult to identify incomplete blinking and distinguish it from complete ones since they are very similar in appearance and normally accomplished within a very short time [14,15]. This causes that there are few studies to identify incomplete blinking based on automated image processing techniques.
To identify incomplete blinking, a commonly used strategy [8,16] is to capture the movement of the eyes over a period using high-speed videography or medical devices (e.g., Lipiview) and manually process the videos frame by frame by clinical ophthalmologists [10,15]. The identification strategy can help to figure out when the incomplete blinking begins and ends, along with its patterns or attributes (e.g., blink rate, amplitude, and duration of closure); however, it is time-consuming and apt to the omission of certain incomplete blinks, due to the short period of a blink cycle [17]. Also, the manual process often involves a relatively high inter- and intra-reader variability because there are no clear criteria or quantitative indicators to distinguish incomplete blinking from complete ones. To accurately and efficiently identify incomplete blinking, it is desirable to develop a computer tool to automatically extract palpebral fissures from each frame of eye videography since the blinking is closely associated with the width of palpebral fissures.
Deep convolutional neural network (CNN) technology [18,19] has been widely used in various video processing applications [20,21] and demonstrated excellent performance comparable with manual annotations. There have been many neural networks developed for image segmentation purposes, such as the DeepLab [22], U-Net [23], and their different variants (e.g., DeepLabv3 [24] and U-Net++ [25]). These available networks had promising potential to handle various medical and natural images. It remains unknown about their performance in extracting palpebral fissures from eye videography. As compared with other types of medical images, eye videography has some distinctive characteristics that may challenge the segmentation, including (1) complex and irrelevant background surrounding palpebral fissures that can cause confusion between the background and fissures, as shown in Fig. 1, and (2) severe motion artifacts [26] that make it challenging to accurately detect the boundaries of palpebral fissures. Very few of the available neural networks have a special consideration of these characteristics and thus may lead to unsatisfactory segmentation performance.
Fig. 1.

Illustration of the complex background surrounding palpebral fissures in consequent frames of the eye videography.
In this study, we developed a texture-aware neural network based on the U-Net (termed TAU-Net) to identify incomplete blinking by segmenting palpebral fissures from each frame of eye videography. Specifically, we introduced three novel convolutional blocks based on element-wise subtraction operations, referred to as short, medium, and long subtraction blocks, to highlight various subtle structural textures associated with desirable objects. These blocks were then integrated with the U-Net to combine its encoding and decoding convolutional blocks at different levels. Our experiments showed that the developed TAU-Net was capable of accurately and efficiently extracting palpebral fissures and identifying incomplete blinking based on eye videography.
2. Materials and methods
2.1. The dataset
To identify incomplete blinking, we collected 60 2-minute videos from the eyes of 60 participants (30 dry eye patients and 30 healthy subjects) using the Keratograph 5 M (Oculus, Germany) at the Wenzhou Medical University Eye Hospital (2019–216-k-193). These videos were acquired under an infrared light mode of 8 frames per second (FPS). Each video contained around 1000 frame images with the same dimension of 1360×1024 pixels. A total of 1196 frame images were randomly chosen from these videos, and the regions of palpebral fissures were manually annotated using the ImageJ software (https://imagej.net/software/imagej/) (Fig. 2) and used as ground truths in image segmentation. These frame images and their annotations were resized to a dimension of 256×256 pixels and evenly divided into three subsets for training, validation, and testing purposes. In particular, we collected another small dataset consisting of 200 frame images acquired from the subjects with infectious keratitis. These 200 images were preprocessed similarly as the blinking images and used as an additional testing dataset for performance comparison.
Fig. 2.

Three different frame images (first row), the regions for palpebral fissures (second row), and the manual annotations (third row).
2.2. Network overview
Fig. 3 showed the architecture of the TAU-Net for segmenting palpebral fissures from each frame of eye videography. The source codes about the TAU-Net implementation can be found at https://github.com/wmuLei/EyeBlinking. As compared with the classical U-Net, the TAU-Net was characterized by the integration of three different subtraction convolutional blocks (SCBs), namely the short, medium, and long subtraction blocks. The U-Net was used as the backbone network to capture various convolutional features from the input images by leveraging many plain convolutional blocks (PCBs). The PCB was employed for both encoding and decoding procedures in the U-Net and formed by stacking a simple convolutional layer twice, which included a 3×3 spatial convolution, a batch normalization (BN) operation, and a rectified linear unit (ReLU) activation. The output dimension of the convolution was given in Fig. 3, namely 32, 64, and 128. The convolutional features captured by PCBs were fed into the introduced SCBs to detect different types of subtle structural textures. These features and subtle textures were then concatenated as composite features for subsequent down- and up-sampling operations. The down-sampling operations used a 2×2 MaxPooling layer (MaxPooling2×2) to reduce the dimension of the input images, eliminate redundant composite features, and improve the efficiency of the network training. The up-sampling operations used a 2×2 UpSampling layer (UpSampling2×2) to recover the image dimension and guide the extraction of desirable objects in an end-to-end manner. The last composite features were converted to a probability map of desirable objects using a 1×1 spatial convolution, a BN operation, and a sigmoid activation.
Fig. 3.

The architecture of the TAU-Net for extracting palpebral fissures from the frame images of eye videography.
2.3. Subtraction blocks
There are three different SCBs in the developed network, including the short, medium, and long SCBs. These SCBs have different internal structures and aim to detect various subtle structural textures associated with target objects, making the TAU-Net more sensitive to the objects, especially their boundaries. Specifically, the short SCB is formed by two element-wise subtractions, an absolute value operation and a sigmoid activation (Fig. 3), and used to assess the internal texture changes between two adjacent convolutional layers in a PCB. The long SCB has the same structure as the short one, but it is not based on two adjacent convolutional layers in a PCB but two different PCBs at the same level, making the outputs of short and long SCBs very different. The medium SCB consists of a sampling operation, two 1×1 spatial convolutions with the BN and ReLU, and a short SCB. The sampling operation and spatial convolutions were used to ensure that the input features had the same dimension. In this way, the short SCB could measure the texture changes between two PCBs at different levels (Fig. 3). Since the U-Net uses the MaxPooling2×2 and UpSampling2×2 in the encoding and decoding procedures, the sampling operation has two different forms (i.e., average and bilinear interpolations), respectively, for these two procedures. With these three subtraction blocks, the developed network is expected to capture different types of structural textures associated with target objects and thus handle the boundary regions of desirable objects properly as compared to the U-Net.
2.4. Network training
We trained the developed TAU-Net from scratch on the collected dataset using the Dice index [27] as the loss function. The Dice measured the overlapping between the computer results and the manual annotation at a specific image region [28], as defined by:
| (1) |
where Tp, Fp and Fn indicate the true positive, false positive, and false negative, respectively. The Dice was optimized by the RMSprop algorithm [29], with an initial learning rate (iLR) of 0.001, a discounting factor of 0.9, a batch size of 8, and an epoch number of 100, respectively. The optimization procedure was stopped if the Dice did not improve for 20 consecutive epochs. To prevent over-fitting, the input frame images were augmented using mulitple random transformations, such as flip, contrast, affine and elastic transformations. After the optimization procedure, the output probability map was thresholded by 0.5 to generate a final segmentation result for the target objects [30].
2.5. Performance validation
We comprehensively assessed the performance of the developed TAU-Net for extracting palpebral fissures from the video frame images using four different metrics, including the Dice, intersection over union (IOU) [31], structural similarity (SSIM) [32], and Hausdorff distance (HD) (in pixel) [27].
| (2) |
| (3) |
| (4) |
where μn, μm, σn, σm, and σnm denote intensity means, standard deviations, and cross-covariance for segmentation results and manual annotations, respectively. c1 and c2 are two different regularization constants. is the directed HD from point set A to point set B. A larger IOU, and SSIM or a smaller HD indicates higher performance. Among these metrics, the IOU assessed the differences between the segmentation results and the manual annotations and was used to reflect the overall performance.
Based on these metrics, we performed the TAU-Net six times by using three subsets of the collected dataset in a different order for training, validation and testing, respectively, and then compared it with several existing networks (i.e., the U-Net [23], Attention U-Net (AT-Net) [33], Asymmetric U-Net (AS-Net) [34], BiO-Net [35], BG-CNN [36], and DeepLabv3 [24] with Xception as the backbone) based on the same experiment configurations (e.g., the RMSprop, iLR, and epoch) and the same datasets (e.g., training, validation, and testing subsets). In addition, we evaluated the impact of the introduced SCBs on the segmentation performance by removing different components from the developed network. As a result, we had five different variants termed TAU-Net1, TAU-Net2, TAU-Net3, TAU-Net4, and TAU-Net5. The TAU-Net1, TAU-Net2, and TAU-Net3 were defined by removing the long SCB, medium SCB, and both the medium and long SCBs, respectively, from the developed network; the TAU-Net4 was defined by removing the sigmoid activation from the short SCB, the TAU-Net5 was constructed by replacing the second subtraction operation in the short SCB with element-wise addition operation. The paired sample t-test was performed to statistically assess the performance differences among the involved networks. A p-value less than 0.05 is considered statistically significant.
After a comprehensive performance validation, we can identify incomplete blinking by extracting palpebral fissures from each frame of eye videography, measuring their maximum width (Fig. 4), and converting the videography into width curves with respect to frame images. These width curves make clinical ophthalmologists able to accurately profile each blink cycle and quickly identify incomplete blinking.
Fig. 4.

Illustration of the maximum width (in blue) of palpebral fissures for identifying incomplete blinking.
3. Results
3.1. Segmentation of palpebral fissures
Table 1 showed the performance differences of the developed TAU-Net with different iLRs for the RMSprop algorithm on the first experiment based on the collected dataset. It can be seen from these results that the TAU-Net achieved better performance when the iLR was set at 0.001.
Table 1.
The results of the developed network with different iLRs for the RMSprop algorithm on the first experiment (short for Test1) in terms of the mean and standard deviation (SD) of Dice, IOU, SSIM, and HD, respectively.
| iLR | Dice Mean ± SD |
IOU Mean ± SD |
SSIM Mean ± SD |
HD Mean ± SD |
|---|---|---|---|---|
|
| ||||
| 0.01 | 0.9437 ± 0.1027 | 0.9048 ± 0.1210 | 0.9446 ± 0.0264 | 5.7626 ± 1.7082 |
| 0.001 | 0.9493 ± 0.0932 | 0.9137 ± 0.1183 | 0.9485 ± 0.0288 | 5.6236 ± 1.7280 |
| 0.0001 | 0.9489 ± 0.0885 | 0.9117 ± 0.1110 | 0.9451 ± 0.0315 | 5.6265 ± 1.6282 |
Table 2 demonstrated the segmentation results of the developed TAU-Net based on the collected dataset. The TAU-Net achieved the average Dice, IOU, SSIM, and HD of 0.9448, 0.9105, 0.9521, and 5.4304, respectively, which can compete with manual annotations. Fig. 5 showed the segmentation results of four frame images by the TAU-Net and their differences with manual annotations.
Table 2.
The performance of the TAU-Net for palpebral fissures on the collected dataset in terms of the mean and standard deviation (SD) of Dice, IOU, SSIM, and HD, respectively.
| Test | Dice Mean ± SD |
IOU Mean ± SD |
SSIM Mean ± SD |
HD Mean ± SD |
|---|---|---|---|---|
|
| ||||
| Test1 | 0.9493 ± 0.0932 | 0.9137 ± 0.1183 | 0.9485 ± 0.0288 | 5.6236 ± 1.7280 |
| Test2 | 0.9645 ± 0.0919 | 0.9402 ± 0.1017 | 0.9590 ± 0.0147 | 5.0880 ± 1.4612 |
| Test 3 | 0.9586 ± 0.0954 | 0.9303 ± 0.1089 | 0.9529 ± 0.0166 | 5.1659 ± 1.5792 |
| Test4 | 0.9142 ± 0.1568 | 0.8671 ± 0.1784 | 0.9439 ± 0.0232 | 5.9737 ± 1.9458 |
| Test5 | 0.9264 ± 0.1570 | 0.8880 ± 0.1763 | 0.9549 ± 0.0197 | 5.3718 ± 1.6704 |
| Test6 | 0.9558 ± 0.0851 | 0.9239 ± 0.1070 | 0.9535 ± 0.0254 | 5.3596 ± 1.5821 |
| Overall | 0.9448 ± 0.1189 | 0.9105 ± 0.1381 | 0.9521 ± 0.0225 | 5.4304 ± 1.6943 |
Fig. 5.

Four randomly chosen video frame images (first row) and their segmentation results obtained by an expert (second row) and the TAU-Net (third row), respectively.
3.2. Performance comparison
Table 3 summarized the segmentation results of the TAU-Net and other six existing networks on the collected dataset. The TAU-Net achieved an average Dice and HD of 0.9448, and 5.4304, respectively. It had a similar performance as the AT-Net (0.9437 and 5.3439) (p = 0.8295), and outperformed the U-Net (0.9386 and 5.4431) (p < 0.001), AS-Net (0.9424 and 5.4698) (p = 0.0219), BiO-Net (0.9431 and 5.3817) (p = 0.0121), BG-CNN (0.9433 and 5.4477) (p = 0.0924), and DeepLabv3 (0.9379 and 5.6707) (p < 0.001). The examples in Fig. 6 visually displayed the performance differences among the involved networks on five test images.
Table 3.
The performance of the developed network and other six segmentation networks on the collected dataset.
| Method | Dice Mean ± SD |
IOU Mean ± SD |
SSIM Mean ± SD |
HD Mean ± SD |
|---|---|---|---|---|
|
| ||||
| U-Net | 0.9386 ± 0.1362 | 0.9037 ± 0.1541 | 0.9509 ± 0.0257 | 5.4431 ± 1.7071 |
| AT-Net | 0.9437 ± 0.1266 | 0.9103 ± 0.1444 | 0.9529 ± 0.0242 | 5.3439 ± 1.6841 |
| AS-Net | 0.9424 ± 0.1262 | 0.9080 ± 0.1445 | 0.9512 ± 0.0251 | 5.4698 ± 1.7321 |
| BiO-Net | 0.9431 ± 0.1222 | 0.9083 ± 0.1424 | 0.9516 ± 0.0234 | 5.3817 ± 1.6866 |
| BG-CNN | 0.9433 ± 0.1212 | 0.9083 ± 0.1399 | 0.9513 ± 0.0237 | 5.4477 ± 1.6933 |
| DeepLabv3 | 0.9379 ± 0.1214 | 0.8991 ± 0.1435 | 0.9464 ± 0.0261 | 5.6707 ± 1.7150 |
| TAU-Net | 0.9448 ± 0.1189 | 0.9105 ± 0.1381 | 0.9521 ± 0.0225 | 5.4304 ± 1.6943 |
Fig. 6.

Performance comparison among the involved networks. The first two rows are the given frame images and their manual annotations, the last seven rows are the segmentation results obtained by the U-Net, AT-Net, AS-Net, Bio-Net, BG-CNN, DeepLabv3, and TAU-Net, respectively.
Table 4 showed the segmentation performances of the involved networks on the additional testing dataset. The developed network was slightly inferior to the AT-Net, AS-Net, and BiO-Net in terms of the IOU, but superior to the other networks (i.e., the U-Net, BG-CNN, DeepLabv3). Moreover, our network had the smallest Hausdorff distance among the involved networks.
Table 4.
The performance of the developed network and other six segmentation networks on the additional testing dataset.
| Method | Dice Mean ± SD |
IOU Mean ± SD |
SSIM Mean ± SD |
HD Mean ± SD |
|---|---|---|---|---|
|
| ||||
| U-Net | 0.9852 ± 0.0311 | 0.9722 ± 0.0427 | 0.9511 ± 0.0258 | 4.2416 ± 1.0680 |
| AT-Net | 0.9885 ± 0.0099 | 0.9774 ± 0.0178 | 0.9545 ± 0.0134 | 4.3027 ± 0.9369 |
| AS-Net | 0.9888 ± 0.0099 | 0.9780 ± 0.0175 | 0.9559 ± 0.0137 | 4.4049 ± 0.8667 |
| BiO-Net | 0.9882 ± 0.0118 | 0.9768 ± 0.0208 | 0.9551 ± 0.0152 | 4.1252 ± 0.8020 |
| BG-CNN | 0.9851 ± 0.0224 | 0.9715 ± 0.0345 | 0.9507 ± 0.0249 | 4.0039 ± 0.8750 |
| DeepLabv3 | 0.9844 ± 0.0270 | 0.9704 ± 0.0400 | 0.9484 ± 0.0287 | 4.7183 ± 1.3477 |
| TAU-Net | 0.9863 ± 0.0150 | 0.9733 ± 0.0253 | 0.9511 ± 0.0174 | 3.9812 ± 0.7955 |
Table 5 summarized the paired t-test results of the involved networks based on two different segmentation datasets. As demonstrated by the table, the TAU-Net achieved the average IOU of 0.9315 for the collected dataset (Table 3) and additional testing dataset (Table 4). It had a similar performance as the AT-Net (0.9327) (p = 0.0796), AS-Net (0.9313) (p = 0.8444), BiO-Net (0.9312) (p = 0.6376), but outperformed the U-Net (0.9266) (p < 0.001), BG-CNN (0.9294) (p = 0.0253), DeepLabv3 (0.9229) (p < 0.001).
Table 5.
The paired t-tests of the developed network and other six segmentation networks on the collected dataset and additional testing dataset (‘–’ denotes no statistical analysis).
| Method | U-Net | AT-Net | AS-Net | BiO-Net | BG-CNN | DeepLabv3 | TAU-Net |
|---|---|---|---|---|---|---|---|
|
| |||||||
| U-Net | – | ||||||
| AT-Net | <0.001 | – | |||||
| AS-Net | <0.001 | 0.0993 | – | ||||
| BiO-Net | <0.001 | 0.0234 | 0.8523 | – | |||
| BG-CNN | 0.0064 | <0.001 | 0.0483 | 0.0369 | – | ||
| DeepLabv3 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | – | |
| TAU-Net | <0.001 | 0.0796 | 0.8444 | 0.6376 | 0.0253 | <0.001 | – |
3.3. Effect of the SCBs
To assess the impact of the introduced SCBs on the segmentation performance, we also compared the developed TAU-Net with its five variants using the same experiment configurations and datasets (Tables 6–8). The TAU-Net achieved the average Dice and IOU of 0.9587 and 0.9315 respectively on two different datasets. It was better than the TAU-Net1 (0.9573 and 0.9230) (p = 0.0365), TAU-Net2 (0.9564 and 0.9285) (p < 0.001), TAU-Net3 (0.9575 and 0.9311) (p = 0.6286), TAU-Net4 (0.9580 and 0.9310) (p = 0.5348), and TAU-Net5 (0.9532 and 0.9254) (p < 0.001). Fig. 7 illustrated the performance difference among the TAU-Net and its five variants.
Table 6.
The performance of the TAU-Net and its five variants on the collected dataset.
| Method | Dice Mean ± SD |
IOU Mean ± SD |
SSIM Mean ± SD |
HD Mean ± SD |
|---|---|---|---|---|
|
| ||||
| TAU-Net | 0.9448 ± 0.1189 | 0.9105 ± 0.1381 | 0.9521 ± 0.0225 | 5.4304 ± 1.6943 |
| TAU-Net1 | 0.9427 ± 0.1239 | 0.9081 ± 0.1439 | 0.9521 ± 0.0244 | 5.4006 ± 1.6841 |
| TAU-Net2 | 0.9413 ± 0.1252 | 0.9059 ± 0.1453 | 0.9505 ± 0.0246 | 5.3654 ± 1.6850 |
| TAU-Net3 | 0.9421 ± 0.1288 | 0.9081 ± 0.1476 | 0.9521 ± 0.0244 | 5.4001 ± 1.7190 |
| TAU-Net4 | 0.9434 ± 0.1236 | 0.9091 ± 0.1424 | 0.9520 ± 0.0228 | 5.4347 ± 1.6849 |
| TAU-Net5 | 0.9365 ± 0.1402 | 0.9011 ± 0.1589 | 0.9502 ± 0.0263 | 5.4887 ± 1.7679 |
Table 8.
The paired t-tests of the TAU-Net and its five variants on the collected dataset and additional testing dataset (‘–’ denotes no statistical analysis).
| Method | TAU-Net | TAU-Net1 | TAU-Net2 | TAU-Net3 | TAU-Net4 | TAU-Net5 |
|---|---|---|---|---|---|---|
|
| ||||||
| TAU Net | – | |||||
| TAU-Net1 | 0.0365 | – | ||||
| TAU-Net2 | <0.001 | 0.0507 | – | |||
| TAU-Net3 | 0.6286 | 0.1778 | <0.001 | – | ||
| TAU-Net4 | 0.5348 | 0.2083 | 0.0015 | 0.9051 | – | |
| TAU-Net5 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | – |
Fig. 7.

The results obtained by the TAU-Net, TAU-Net1, TAU-Net2, TAU-Net3, TAU-Net4, and TAU-Net5, respectively, for the video frame images displayed in Fig. 6.
3.4. Identification of incomplete blinking
As demonstrated by the above experiments, the developed network was capable of accurately extracting palpebral fissures from frame images of the collected videos. With the extracted palpebral fissures, we can measure their maximum width and convert these videos into width curves with respect to frame images (Fig. 8) for assessing incomplete blinking. Note that the width ranges from 0 to 256 in pixel for each frame of the videos because these frame images were resized to 256×256 for the developed network. Since there are no clear criteria to detect incomplete blinking and distinguish it from complete ones, ophthalmologists empirically define incomplete blinking when the descending upper eyelid covers between 30% and 75% of the cornea in clinical settings [5]. Based on the specific criteria, we compared the performance differences of an ophthalmologist (about 5-year experience), with and without the help of the developed network, for incomplete blinking and summarized the differences in Table 9. As shown by the table, the ophthalmologist took an average of 5.61 minutes for ten videos and identified a total of 447 incomplete blinks when using the developed network, and was better than the counterparts without using the developed network (i.e., 17.4 minutes and 473 incomplete blinks) (p < 0.01). This difference in identification time and number demonstrated a reasonable advantage of the developed network to distinguish complete and incomplete blinking as well as minor eyelid twitches.
Fig. 8.

The width curve of palpebral fissures obtained by the developed TAU-Net for each frame of a given video and its close-up in a green rectangle. The width was denoted as a red point for each frame, and the blue curve depicted the movement of the upper eyelid for the whole video.
Table 9.
The performance differences of an experienced ophthalmologist with or without using the developed network for incomplete blinking based on ten videos. ‘Num’ and ‘Time’ denote the number of incomplete blinking in a given video and detection time (in minute), respectively.
| Video | Manual Num | Time | Computer Num | Time |
|---|---|---|---|---|
|
| ||||
| Video01 | 43 | 20.1 | 42 | 5.7 |
| Video02 | 42 | 22.0 | 25 | 5.0 |
| Video03 | 35 | 15.8 | 36 | 5.3 |
| Video04 | 38 | 18.4 | 28 | 6.7 |
| Video05 | 32 | 16.9 | 40 | 5.0 |
| Video06 | 39 | 13.7 | 29 | 5.2 |
| Video07 | 29 | 14.3 | 40 | 5.5 |
| Video08 | 90 | 18.5 | 92 | 6.1 |
| Video09 | 75 | 20.4 | 70 | 5.2 |
| Video10 | 50 | 14.1 | 45 | 6.4 |
| Mean | 47.3 | 17.4 | 44.7 | 5.6 |
4. Discussion
We developed a texture-aware neural network based on the U-Net (termed TAU-Net) for segmenting palpebral fissures from frame images of eye videography and assisting in identifying incomplete blinking. The developed TAU-Net demonstrated promising performance in capturing various subtle structural textures associated with target objects due to the introduction of the short, medium, and long SCBs. This underlying design made our network sensitive to the objects and capable of effectively preventing information loss caused by multiple down-sampling operations. We comprehensively assessed the performance of the TAU-Net and compared it with several available networks based on the collected dataset and additional testing dataset. Our quantitative assessment showed that the developed network achieved promising performance in segmenting palpebral fissures from frame images and could reliably identify incomplete blinking.
When comparing with other available networks, the developed network obtained the average IOU and HD of 0.9315 and 4.9462, respectively, for two different datasets. Its overall performance was similar to the AT-Net (0.9327 and 4.9961) but better than the U-Net (0.9266 and 5.0417), AS-Net (0.9313 and 5.1140), BiO-Net (0.9312 and 0.9620), BG-CNN (0.9294 and 4.9654), DeepLabv3 (0.9229 and 5.3525) on the same two datasets. Despite the promising performance, our network had relatively low segmentation stability on different datasets. Specifically, our network had the highest segmentation accuracy on the collected dataset (Table 3), but the fourth highest accuracy on the additional testing dataset (Table 4). This suggested that our network is more suitable for the specific problem in this study.
The introduced SCBs played important roles in enhancing the sensitivity of the developed network to target objects and their boundaries. These SCBs employed different internal structures to quantify the texture changes between encoding and decoding convolutional blocks at different levels and thus affected the performance of the developed network differently. Specifically, the short SCB was superior to the long one, both of which were superior to the medium SCB, as verified by the performance of the U-Net, TAU-Net (i.e., U-Net with the short, medium and long SCBs), TAU-Net1 (i.e. U-Net with the short and medium SCBs), TAU-Net2 (i.e., U-Net with the short and long SCBs), and TAU-Net3 (i.e., U-Net with the short SCB) on two different datasets (Tables 6 and 7). In particular, we assessed the impact of the sigmoid activation and the second subtraction operation on the introduced SCBs, according to the performance of the TAU-Net, TAU-Net4, and TAU-Net5. Experiment results demonstrated that the sigmoid activation increased the learning potential of the PCBs by assigning larger weights for some valuable textures since the TAU-Net had better performance than the TAU-Net4 (i.e., TAU-Net without the sigmoid in the SCBs). The subtraction operations reduced the influence of irrelevant background and improved the performance of the developed network since the TAU-Net was better than the TAU-Net5 (i.e., TAU-Net with the addition in place of the second subtraction). These subtraction operations can detect small texture changes between different convolutional features. The changes were generally associated with various edge information due to the characteristics of convolutional operations (Fig. 9) and thus helpful to locate desirable objects, as analyzed in previous studies [34,36].
Table 7.
The performance of the TAU-Net and its five variants on the additional testing dataset.
| Method | Dice Mean ± SD |
IOU Mean ± SD |
SSIM Mean ± SD |
HD Mean ± SD |
|---|---|---|---|---|
|
| ||||
| TAU-Net | 0.9863 ± 0.0150 | 0.9733 ± 0.0253 | 0.9511 ± 0.0174 | 3.9812 ± 0.7955 |
| TAU-Net1 | 0.9864 ± 0.0135 | 0.9736 ± 0.0231 | 0.9519 ± 0.0168 | 4.2132 ± 0.8767 |
| TAU-Net2 | 0.9863 ± 0.0167 | 0.9735 ± 0.0284 | 0.9516 ± 0.0195 | 4.0295 ± 0.7792 |
| TAU-Net3 | 0.9882 ± 0.0152 | 0.9770 ± 0.0252 | 0.9551 ± 0.0174 | 4.3376 ± 0.9198 |
| TAU-Net4 | 0.9870 ± 0.0139 | 0.9747 ± 0.0244 | 0.9523 ± 0.0178 | 4.3330 ± 1.0145 |
| TAU-Net5 | 0.9866 ± 0.0178 | 0.9740 ± 0.0305 | 0.9514 ± 0.0205 | 4.3995 ± 1.0786 |
Fig. 9.

(a)-(c) are a given frame image and its two different filtered versions obtained by the same Gaussian filter with a standard deviation of 5, respectively; (d) is the difference between (b) and (c); (e) and (f) are the subtraction and addition between image (d) and its absolute value, respectively.
Finally, we assessed the feasibility and effectiveness of the developed network in identifying incomplete blinking from the collected videos by comparing the performance differences of an experienced ophthalmologist with and without the help of our developed network. Our experiments demonstrated that the developed network could assist in identifying incomplete blinking accurately and largely reduce the identification time. This is primarily attributed to the fact that our network clearly showed each blink cycle and simplified the analysis of eye videos frame by frame. Also, the identification avoided the influence of many minor twitches of the upper eyelid, which generally covered less than 30% of the cornea (Fig. 8). However, the identification was a relatively subjective procedure because (1) complete and incomplete blinkings are very similar to each other in terms of blink interval, duration and speed, leading to relatively high inter- and intra-observer variability for their identification; (2) there are no clear identification indicators for a specific algorithm to automatically detect or distinguish the two types of blinking in the literature, causing that the identification indicators are often given empirically; (3) there are inevitably identification errors only according to the segmentation results obtained by a given network. This suggested that subjective assessment and empirical indicators play important roles in identifying incomplete blinking, thus making it impossible to objectively compare the identification performance of these networks.
5. Conclusion
We developed a texture-aware segmentation network to extract palpebral fissures from frame images of eye videography and identify incomplete blinking. The novelty of the network lies in the integration of three unique subtraction convolutional blocks with the U-Net. This integration led to a close connection between multi-level encoding and decoding convolutional blocks in the U-Net and improved their learning capability to capture subtle structure textures due to subtraction operations. These subtle textures were generally associated with desirable objects and their boundaries, thereby improving the detection accuracy of the developed network for the objects. Extensive experiments on two different datasets showed that our developed network had a promising potential to suppress irrelevant background and motion artifacts and obtained better performance than several existing networks, including the U-Net, Asymmetric U-Net, BiO-Net, BG-CNN, and DeepLabv3.
Acknowledgments
This work is supported in part by Wenzhou Science and Technology Bureau (Grant No. 2020Y1534 and Y2020035) and National Natural Science Foundation of China (Grant No. 62006175).
Footnotes
CRediT authorship contribution statement
Qinxiang Zheng, Xin Zhang, and Juan Zhang: Data collection, Manual annotation, Writing. Xin Zhang, Furong Bai, Shenghai Huang: Software, Visualization, Draft preparation. Jiantao Pu and Wei Chen: Methodology, Validation, Discussion. Jiantao Pu and Lei Wang: Conceptualization, Revising manuscript, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
- [1].Rodriguez J, Lane K, Ousler G, Angjeli E, Smith L, Abelson M, Blink: characteristics, controls, and relation to dry eyes, Curr. Eye Res. 43 (1) (2017) 52–66. [DOI] [PubMed] [Google Scholar]
- [2].Stern J, Boyer D, Schroeder D, Blink rate: a possible measure of fatigue, Hum. Factors 36 (2) (1994) 285–297. [DOI] [PubMed] [Google Scholar]
- [3].Harrison W, Begley C, Liu H, Chen M, Garcia M, Smith J, Menisci and fullness of the blink in dry eye, Optom. Vis. Sci 85 (8) (2008) 706–714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Korb D, Baron D, Herman J, Finnemore V, Exford J, Hermosa J, Leahy C, Glonek T, Greiner J, Tear film lipid layer thickness as a function of blinking, Cornea 13 (4) (1994) 354–359. [DOI] [PubMed] [Google Scholar]
- [5].Cardona G, Garcia C, Seres C, Vilaseca M, Gispets J, Blink rate, blink amplitude, and tear film integrity during dynamic visual display terminal tasks, Curr. Eye Res. 36 (3) (2011) 190–197. [DOI] [PubMed] [Google Scholar]
- [6].DeAngelis K, Rider A, Potter W, Jensen J, Fowler B, Fleming J, Eyelid spontaneous blink analysis and age-related changes through high-speed imaging, Ophthalmic Plast. Reconstr. Surg. 35 (5) (2019) 487–490. [DOI] [PubMed] [Google Scholar]
- [7].McMonnies C, Incomplete blinking: exposure keratopathy, lid wiper epitheliopathy, dry eye, refractive surgery, and dry contact lenses, Contact Lens and Anterior Eye 30 (1) (2007) 37–51. [DOI] [PubMed] [Google Scholar]
- [8].Argiles M, Cardona G, Cabre E, Rodriguez M, Blink rate and incomplete blinks in six different controlled hard-copy and electronic reading conditions, Invest. Ophthalmol. Vis. Sci. 56 (11) (2015) 6679–6685. [DOI] [PubMed] [Google Scholar]
- [9].Ousler G, Abelson M, Johnston P, Rodriguez J, Lane K, Smith L, Blink patterns and lid-contact times in dry-eye and normal subjects, Clinical Ophthalmology 8 (2014) 869–874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Wan T, Jin X, Lin L, Xu Y, Zhao Y, Incomplete blinking may attribute to the development of meibomian gland dysfunction, Curr. Eye Res. 41 (2) (2016) 179–185. [DOI] [PubMed] [Google Scholar]
- [11].Kim Y, Kwak A, Lee S, Yoon J, Jang S, Meibomian gland dysfunction in Graves’ orbitopathy, Can. J. Ophthalmol. 50 (4) (2015) 278–282. [DOI] [PubMed] [Google Scholar]
- [12].Fogelton A, Benesova W , Eye blink completeness detection, Comput. Vis. Image Underst. 176 (2018) 78–85. [Google Scholar]
- [13].Hershman R, Henik A, Cohen N, A novel blink detection method based on pupillometry noise, Behav. Res. Methods 50 (1) (2018) 107–114. [DOI] [PubMed] [Google Scholar]
- [14].Jie Y, Sella R, Feng J, Gomez M, Afshari N, Evaluation of incomplete blinking as a measurement of dry eye disease, The Ocular Surface 17 (3) (2019) 440–446. [DOI] [PubMed] [Google Scholar]
- [15].Wang M, Tien L, Han A, Lee J, Kim D, Markoulli M, Craig J, Impact of blinking on ocular surface and tear film parameters, The Ocular Surface 16 (4) (2018) 424–429. [DOI] [PubMed] [Google Scholar]
- [16].Espinosa J, Roig A, Perez J, Mas D, A high-resolution binocular video-oculography system: assessment of pupillary light reflex and detection of an early incomplete blink and an upward eye movement, Biomed. Eng Online 14 (22) (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Braun R, Smith P, Begley C, Li L, Gewecke N, Dynamics and function of the tear film in relation to the blink cycle, Prog. Retinal Eye Res. 45 (2015) 132–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Sua J, Lim S, Yulius M, Su X, Yapp E, Le N, Yeh H, Chua M, Incorporating convolutional neural networks and sequence graph transform for identifying multilabel protein Lysine PTM sites, Chemometrics and Intelligent Laboratory Systems 206 (15) (2020), 104171. [Google Scholar]
- [19].Le N, Ho Q, Yapp E, Ou Y, Yeh H, DeepETC: A deep convolutional neural network architecture for investigating and classifying electron transport chain’s complexes, Neurocomputing 375 (29) (2020) 71–79. [Google Scholar]
- [20].Rouast P, Adam M, Learning deep representations for video-based intake gesture detection, IEEE J. Biomed. Health. Inf. 24 (6) (2020) 1727–1737. [DOI] [PubMed] [Google Scholar]
- [21].Oprea S, Gonzalez P, Garcia A, Alejandro J, Escolano S, Rodriguez J, Argyro A, A review on deep learning techniques for video prediction, IEEE Trans. Pattern Anal. Mach. Intell. (2020), 10.1109/TPAMI.2020.3045007. [DOI] [PubMed] [Google Scholar]
- [22].Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille A, DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, IEEE Trans. Pattern Anal. Mach. Intell. 40 (4) (2018) 834–848. [DOI] [PubMed] [Google Scholar]
- [23].Ronneberger O, Fischer P, Brox T, U-Net: Convolutional Networks for Biomedical Image Segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015. [Google Scholar]
- [24].Chen L, Zhu Y, Papandreou G, Schroff F, Adam H, Encoder-decoder with atrous separable convolution for semantic image segmentation, European Conference on Computer Vision (ECCV) (2018) 838–851. [Google Scholar]
- [25].Zhou Z, Siddiquee M, Tajbakhsh N, Liang J, UNet++: redesigning skip connections to exploit multiscale features in image segmentation, IEEE Trans. Med. Imaging 39 (6) (2019) 1856–1867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Chang W, Lim J, Im C, An unsupervised eye blink artifact detection method for real-time electroencephalogram processing, Physiol. Meas 37 (3) (2016) 401–417. [DOI] [PubMed] [Google Scholar]
- [27].Wang L, Liu H, Lu Y, Chen H, Zhang J, Pu J, A coarse-to-fine deep learning framework for optic disc segmentation in fundus images, Biomed. Signal Process. Control 51 (2019) 82–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Wang L, Zhu J, Sheng M, Cribb A, Zhu S, Pu J, Simultaneous segmentation and bias field estimation using local fitted images, Pattern Recogn. 74 (2018) 145–155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Mukkamala M, Hein M, Variants of RMSProp and Adagrad with Logarithmic Regret Bounds, International Conference on MachineLearning (2017). [Google Scholar]
- [30].Wang L, Shen M, Chang Q, Shi C, Zhu Y, Pu J, BG-CNN: A boundary guided convolutional neural network for corneal layer segmentation from optical coherence tomography, in: International Conference on Biomedical Signal and Image Processing (ICBIP), 2020, pp. 1–6. [Google Scholar]
- [31].Wang L, Shen M, Shi C.e., Zhou Y, Chen Y, Pu J, Chen H, EE-Net: An edge-enhanced deep learning network for jointly identifying corneal micro-layers from optical coherence tomography, Biomed. Signal Process. Control 71 (2022) 103213. [Google Scholar]
- [32].Wang Z, Bovik A, Sheikh H, Simoncelli E, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Processigng 13 (4) (2004) 600–612. [DOI] [PubMed] [Google Scholar]
- [33].Oktay O, Schlemper J, Folgoc L, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla N, Kainz B, Glocker B, Rueckert D, Attention U-Net: Learning Where to Look for the Pancreas, Conference on Medical Imaging with Deep Learning (MIDL), 2018. [Google Scholar]
- [34].Wang L, Gu J, Chen Y, Liang Y, Zhang W, Pu J, Chen H, Automated segmentation of the optic disc from fundus images using an asymmetric deep learning network, Pattern Recogn. 112 (2021) 107810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Xiang T, Zhang C, Liu D, Song Y, Huang H, Cai W, BiO-Net: learning recurrent bi-directional connections for encoder-decoder architecture. International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2020. [Google Scholar]
- [36].Wang L, Shen M, Chang Q, Shi C.e., Chen Y, Zhou Y, Zhang Y, Pu J, Chen H, Automated delineation of corneal layers on OCT images using a boundary-guided CNN, Pattern Recogn. 120 (2021) 108158. [DOI] [PMC free article] [PubMed] [Google Scholar]
