Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jul 1.
Published in final edited form as: Annu Int Conf IEEE Eng Med Biol Soc. 2023 Jul;2023:1–4. doi: 10.1109/EMBC40787.2023.10340785

Tensor-based Feature Extraction for Pupil Recognition in Cataract Surgery

Binh Duong Giap 1, Karthik Srinivasan 2, Ossama Mahmoud 3,4, Shahzad Ihsan Mian 5, Bradford Laurence Tannen 6, Nambi Nallasamy 7,8
PMCID: PMC10979349  NIHMSID: NIHMS1978258  PMID: 38082579

Abstract

Cataract surgery remains the definitive treatment for cataracts, which are a major cause of preventable blindness worldwide. Adequate and stable dilation of the pupil are necessary for the successful performance of cataract surgery. Pupillary instability is a known risk factor for cataract surgery complications, and the accurate segmentation of the pupil from surgical video streams can enable the analysis of intraoperative pupil changes in cataract surgery. However, pupil segmentation performance can suffer due to variations in surgical illumination, obscuration of the pupil with surgical instruments, and hydration of the lens material intraoperatively. To overcome these challenges, we present a novel method called tensor-based pupil feature extraction (TPFE) to improve the accuracy of pupil recognition systems. We analyzed the efficacy of this approach with experiments performed on a dataset of 4,560 intraoperative annotated images from 190 cataract surgeries in human patients. Our results indicate that TPFE can identify features relevant to pupil segmentation and that pupil segmentation with state-of-the-art deep learning models can be significantly improved with the TPFE method.

I. INTRODUCTION

Cataract surgery addresses the pathologic clouding of the natural lens of the eye through removal of the clouded lens material and replacement with an artificial intraocular lens (IOL) implant. It is one of the most commonly performed surgeries worldwide and essential to addressing preventable blindness. There were more than 20 million cataract surgeries performed in the world in 2015, of which 3.6 million cases were in the United States of America and more than 4.2 million cases were in the European Union [1]. Cataracts most commonly occur as a natural part of the aging process, but can also occur due to trauma, medication side effects, and metabolic disorders. Despite improvements in cataract surgery technology and the phacoemulsification procedure, cataracts are one of the leading causes of blindness worldwide.

Cataracts are accessed surgically by passing instruments through the pupil, which is the natural aperture in the iris tissue. The pupil is pharmacologically dilated at the time of surgery, but pupillary instability during surgery is a major risk factor for cataract surgery complications. As such, pupil recognition in cataract surgery is an essential task that allows surgeons to understand changes in pupil morphology during surgery. However, recognition performance is affected by many factors occurring during cataract surgery, such as variations in illumination, obscuration by surgical instruments, and hydration of cataractous lens material altering the appearance of the pupil itself. In the recent past, with the rapid development of deep learning, studies [2][5] proposed methods for pupil recognition in various scenarios, predominantly outside of the context of cataract surgery.

Lee et al. [2] applied UNet model to recognize and separate pupil from nystagmography video achieve accurate trajectory of pupil for diagnosis of various vestibular disorders. The group showed the potential for applying deep learning methods to diagnosing vestibular disorders since the UNet model achieved 94.85% dice coefficient. In other scenarios, Han et al. [3] proposed an indirect use of the CNN for pupil center detection. Accordingly, the pupil region is first segmented by CNN models and the center of the segmented region is then determined. The authors demonstrated that the proposed method outperforms other conventional methods that directly find the pupil center as a regression result. Gowroju et al. [4] presented a modified UNet model to improve the accuracy of pupil segmentation in non-surgical images. The proposed model achieved a notable accuracy when compared to the original UNet model while consuming less time.

In the field of cataract surgery, Sokolova et al. [5] trained and tested a Mask R-CNN on a small (82-image) dataset of cataract surgery frames and showed moderate performance in the pupil segmentation task. This study was limited by the size of its dataset and the need for analysis of performance across varying phases of surgery and in the context of pupil obscuration by a variety of surgical instruments.

In this paper, we propose a novel method called tensor-based pupil feature extraction (TPFE) for pupil recognition in cataract surgery using deep learning models. The proposed method is designed to extract features of the pupil region effectively with the goal of utilizing TPFE’s feature-rich output as input to pupil recognition systems. In order to study its effectiveness, we undertook experiments to evaluate the segmentation performance of a set of deep learning models with and without TPFE pre-processing. In the subsequent sections, we describe the TPFE method, report its impact on pupil segmentation performance in our experiments, and consider potential implications and future directions for this work.

II. METHOD

A. Tensor-based Pupil Image Representation

A tensor is defined as a multilinear mapping over a set of vector spaces [6]. Tensors represent a higher-order generalization of vectors and are commonly utilized for multidimensional data representation. Tensors are widely used to represent videos, color images, and hyperspectral images [7][9] through various data arrangements. In order to ensure the clarity of the notation in this study, we first present the notation and terminology related to tensors used in this paper [10], [11].

  1. An Nth - order tensor denoted as 𝓟 is formally defined as an element of the tensor product of N vector spaces, where the order N of a tensor is the number of dimensions and 𝓟RI1×I2××IN.

  2. Fibers are the higher-order analog of matrix rows and columns. A fiber of tensor 𝓟 can be determined by keeping an index, while fixing others. A third-order tensor 𝓟 involves column, row, and tube fibers denoted as p:ml, pn:l, and pnm:, respectively.

  3. Slices are two-dimensional portions of a tensor. A slice is defined by fixing all indices but two. A third-order tensor 𝓟 contains slices denoted as Pn::, P:m:, and P::l, which are called horizontal, lateral, and frontal slices, respectively.

  4. A tensor 𝓟RI1×I2××IN has N ways to transform its data into a matrix called mode-k unfolding of tensor 𝓟 or tensor matricization. It is denoted by P(k), where k{1,2,,N}.

  5. The mode-k product is the result of the multiplication of a tensor by a matrix in mode-k. The mode-k product 𝓟×kU of a tensor 𝓟RI1×I2××IN with a matrix URJ×Ik is a tensor of size I1××Ik1×J×Ik+1××IN. Therefore, we have

(𝓟×kU)i1ik1jik+1iN=ikINxi1i2iNujik. (1)

In this study, to effectively exploit the relationship among color channels and spatial information, we construct a third-order tensor with three frontal slices denoting three color channels of a given color pupil image. In particular, given a color pupil image denoted IRh×w of size h×w, it is split into Ib, Ig, and Ir, which correspond to the three-color channel images in RGB color space. A third-order tensor denoted as 𝓟Rh×w×3 sized h×w×3 is then constructed by adopting each color channel Ic, c{b,g,r} to be a frontal slice P::l, as depicted in Fig. 2.

Fig. 2.

Fig. 2.

Tensor-based pupil image construction. (a) A given color pupil image IRh×w; (b) Blue, green, and red color channels Ic, c{b,g,r} of (a); and (c) The third-order tensor 𝓟Rh×w×3 of (a).

B. High-Order Singular Value Decomposition of a Color Pupil Image

A given N-order tensor 𝓟RI1×I2××IN can be expressed as a linear combination of the outer products in different modes. Specifically, the high-order singular value decomposition (HOSVD) expresses tensor 𝓟RI1×I2××IN as follows:

𝓟=S×1U(1)×2U(2)×3×NUN. (2)

It can also be expressed elementwise as:

pi1i2iN=j1=1I1j2=1I2jN=1INsj1j2jNui1j1(1)ui2j2(2)uiNjN(N), (3)

where 𝓢RI1×I2××IN denotes the core tensor, which has the same size with the given tensor 𝓟. U(k)RIk×Ik, k={1,2,,N} are matrices containing the left singular vectors of the mode-k unfolding matrices P(k) of tensor 𝓟. These matrices U(k) are also called inverse factors and determined through singular value decomposition of mode-k unfolding matrices P(k) as follows:

P(k)=U(k)(k)V(k)T, (4)

where Σ(k) denotes the singular value matrices and V(k) contains the right singular vectors of P(k). Furthermore, the core tensor denoted 𝓢 is determined by:

𝓢=𝓧×1U(1)T×2U(2)T×3×NUNT. (5)

As with the given tensor, 𝓟, mode-k unfolding matrices S(k) of the core tensor 𝓢 can be obtained by:

S(k)=Σ(k)V(k)U(N)U(k+1)U(k1)U(1), (6)

where denotes the Kronecker product of two matrices. At the end, a mode-k unfolding matrix P(k) of the given tensor 𝓟 can be reconstructed by:

P(k)=U(k)S(k)U(N)U(k+1)U(k1)U(1)T. (7)

In this study, we first construct a third-order tensor 𝓟Rh×w×3 of a given pupil image IRh×w as discussed in subsection II-A. By using HOSVD, 𝓟 is then decomposed into multiple components: three inverse factors denoted U(1), U(2), and U(3) and a core tensor 𝓢, as depicted in Fig. 3. HOSVD decomposition is briefly presented in Algorithm 1.

Fig. 3.

Fig. 3.

The results of HOSVD decomposition of a third-order tensor 𝓟Rn×w×3 in Fig. 2(c). (a) Inverse factor U(1); (b) Inverse factor U(2); (c) Inverse factor U(3); and (d) Core tensor 𝓢.

Algorithm 1:

HOSVD

Input: A Color Pupil Image IRh×w
Output: U(1), U(2), U(3), and 𝓢

 1. Ic,c{b,g,r}Split(I);
 2. 𝓟Rh×w×3ReshapeIc;
 3. get(U(k),Σ(k),V(k)) via Eq. (4);
 4. getS(k) via Eq. (6);
 5. 𝓢ReshapeS(k)
 6. Return U(1), U(2), U(3), 𝓢.

C. Tensor-based Pupil Feature Extraction

In this study, we investigated the information carried by elements of inverse factor U(3) and sought to emphasize elements of U(3) seen to carry pupil-related information and de-emphasize elements seen to carry noise and background data. The feature-extracted image was then obtained by inverting HOSVD with the original U(1) and U(2) and a modified Um(3).

The results of the mentioned analysis are depicted in Fig. 4. As seen, three elements in the first column of U(3) can reconstruct much of the three color channels, which are blue, green, and red, of the given input image. Accordingly, both pupil and interference information are mostly present and would be recovered from these elements. On the other hand, information specific to the pupil region appears to be present within the elements of the second column of the inverse factor U(3). As seen in Fig. 4, the third column of U(3) contains primarily interference information.

Fig. 4.

Fig. 4.

The information of Fig. 2(a) carried by nine elements of inverse factor U(3)R3×3. The position of the image corresponds to the location of the investigated element in U(3).

In order to highlight the pupil region and eliminate the interference information in the given image, the elements of the first column were de-emphasized by subtracting off the mean of three elements. The proposed feature extraction algorithm is briefly presented in Algorithm 2 and the result of the proposed method is depicted in Fig. 5.

Fig. 5.

Fig. 5.

Result of the proposed pupil feature extraction based on tensor. (a) Original images from different cataract surgeries. (b) Feature images of (a) by the proposed method.

Algorithm 2:

TPFE

Input: Pupil Image Ih×w
Output: Feature Image Fh×w

 1. U(1),U(2),U(3),𝓢Algorithm1(𝓟);
 2. μmeanU(3)(1,1),U(3)(2,1),U(3)(3,1};
 3. U(3)(i,1)=(U(3)(i,1)μ),i{1,2,3};
 4. getP(k)(u),k{1,2,3} via Eq. (7).
 5. 𝓟(u)=P::l,l=1,2,,12(u)ReshapeP(k)(u);
 6. Fc,c{b,g,r}Reshape(𝓟(u));
 7 FMergeFb,Fg,Fr;
 8 Return F.

III. EXPERIMENTAL RESULTS AND DISCUSSIONS

A. Dataset

To evaluate the performance of the proposed method, we adopted a dataset consisting of 4,560 intraoperative images of size 480×270 pixels from 152 cataract surgeries, which were performed by experienced surgeons at Kellogg Eye Center, University of Michigan from 2020 to 2021. The pupil region on the images was manually annotated using MATLAB version R2022a with the Wacom One drawing tablet.

B. Pupil Recognition by Deep Learning Models

In order to assess performance of the proposed framework in improving the accuracy of pupil recognition, we adopted six state-of-the-art deep learning models: Xception [12], HR-Net [13], DeepLabV3+ [14], FPN [15], UNet [16], and LinkNet [17], each with VGG16 [18] as the backbone network for our experiments. The selected models were all pre-trained on ImageNet dataset [19] to reduce training time.

The dataset was randomly split into training and validation subsets consisting of 75% and 25% of the dataset, respectively. Thus, 3,420 images from 114 videos were included for training, 1,140 images from 38 videos were included for validation. Separate instances of each deep learning model were trained using either the original images or the feature images. A batch size of 8 and a learning rate of 10−4 were fixed for training. We trained each for a maximum of 100 epochs corresponding to 42,700 training iterations. To this end, precision, recall, Intersection over Union, and Dice coefficient of all six models on the validation set were compared with and without use of the feature-extracted images. The results of the experiments are shown in Table II. As seen, the accuracy of the deep learning models is considerably improved when combined with the proposed feature extraction TPFE method. Each of the 6 models tested demonstrated a statistically significant improvement in performance (p << 0.05 on Wilcoxon [20] signed rank testing) when utilizing the method.

TABLE II.

SEMANTIC SEGMENTATION RATE (%) OF DEEP LEARNING MODELS ON VALIDATION SET

Network Architecture Precision (%) Recall (%) IoU (%) Dice (%)

Xception 99.28 81.92 81.75 89.29
Xception + TPFE 98.97 85.71 85.37 91.50

HR-Net 99.24 86.88 86.72 92.36
HR-Net + TPFE 99.31 89.36 89.16 93.84

DeepLabV3+ 99.37 86.71 86.58 92.39
DeepLabV3+ + TPFE 99.13 89.49 89.21 93.95

FPN 99.38 88.78 88.62 93.53
FPN + TPFE 99.45 88.97 89.20 93.88

UNet 99.33 87.79 87.62 92.96
UNet + TPFE 99.20 89.01 88.79 93.58

LinkNet 99.35 88.35 88.18 92.30
LinkNet + TPFE 99.10 89.50 89.21 93.83

IV. CONCLUSION

In this paper, we have proposed a novel feature extraction method named TPFE for pupil recognition systems in cataract surgery using deep learning models. The effectiveness of TPFE was comprehensively demonstrated through a set of experiments using state-of-the-art deep learning models. The results reveal that the accuracy of pupil recognition systems in cataract surgery can be significantly improved through utilization of the proposed TPFE method.

ACKNOWLEDGMENT

This work was supported in part by the GME Innovations Fund (N.N., B.T.), The Doctors Company Foundation (N.N., B.T.), and NIH K12EY022299 (N.N.).

Contributor Information

Binh Duong Giap, Department of Ophthalmology & Visual Sciences, University of Michigan, 1000 Wall St., Ann Arbor, MI 48105, USA.

Karthik Srinivasan, Department of Vitreo Retinal, Aravind Eye Hospital, Chennai, Tamil Nadu 600077, India.

Ossama Mahmoud, Department of Ophthalmology and Visual Sciences, University of Michigan, 1000 Wall St., Ann Arbor, MI 48105, USA; Wayne State University School of Medicine, 540 E Canfield St., Detroit, MI 48201, USA.

Shahzad Ihsan Mian, Department of Ophthalmology & Visual Sciences, University of Michigan, 1000 Wall St., Ann Arbor, MI 48105, USA.

Bradford Laurence Tannen, Department of Ophthalmology & Visual Sciences, University of Michigan, 1000 Wall St., Ann Arbor, MI 48105, USA.

Nambi Nallasamy, Department of Computational Medicine & Bioinformatics, 100 Washtenaw Ave., Ann Arbor, MI 48109, USA; Department of Ophthalmology and Visual Sciences, University of Michigan, 1000 Wall St., Ann Arbor, MI 48105, USA.

REFERENCES

  • [1].Grzybowski A, “Recent developments in cataract surgery,” Ann. Transl. Med, vol. 8, no. 22, pp. 1540–1545, Nov. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Lee Y, Lee S, Jang S, Wang HJ, Seo YJ, and Yang S, “Pupil detection and segmentation for diagnosis of Nystagmus with U-Net,” in 2022 International Conference on Electronics, Information, and Communication (ICEIC), Jeju, Korea, Republic of, 2022, pp. 1–2. [Google Scholar]
  • [3].Han SY, Kwon HJ, Kim Y, and Cho NI, “Noise-robust pupil center detection through CNN-based segmentation with shape-prior loss,” IEEE Access, vol. 8, pp. 64739–64749, 2020. [Google Scholar]
  • [4].Gowroju S, Aarti, and Kumar S, “Robust Pupil Segmentation using UNET and Morphological Image Processing,” in 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), Cairo, Egypt, 2021, pp. 105–109. [Google Scholar]
  • [5].Sokolova N, Taschwer M, Sarny S, Putzgruber-Adamitsch D, and Schoeffmann K, “Pixel-Based Iris and Pupil Segmentation in Cataract Surgery Videos Using Mask R-CNN,” in 2020 IEEE 17th International Symposium on Biomedical Imaging Workshops (ISBI Workshops), Iowa City, IA, USA, 2020, pp. 1–4. [Google Scholar]
  • [6].Hackbusch W, “Tensor spaces and numerical tensor calculus,” in Springer Series in Computational Mathematics, Springer, Germany, 2012. [Google Scholar]
  • [7].Bengua JA, Phien HN, Tuan HD, and Do MN, “Efficient Tensor Completion for Color Image and Video Recovery: Low-Rank Tensor Train,” IEEE Trans. Image Process, vol. 26, no. 5, pp. 2466–2479, May 2017. [DOI] [PubMed] [Google Scholar]
  • [8].Le TN, Giap DB, Wang J-W, and Wang C-C, “Tensor-Compensated Color Face Recognition,” IEEE Trans. Inf. Forensics Secur, vol. 16, pp. 3339–3354, 2021. [Google Scholar]
  • [9].Giap DB, Ngoc Le T, Wang J-W, and Wang C-N, “Wavelet Subband-Based Tensor for Smartphone Physical Button Inspection,” IEEE Access, vol. 9, pp. 107399–107415, 2021. [Google Scholar]
  • [10].Kolda TG and Bader BW, “Tensor decomposition and applications,” SIAM Review, vol. 51, no. 3, pp. 455–500, Sep. 2009. [Google Scholar]
  • [11].Kolda TG, “Multilinear operators for higher-order decompositions,” Tech. Rep. No. SAND2006-2081, Sandia National Laboratories, 2006. [Google Scholar]
  • [12].Chollet F, “Xception: Deep learning with depthwise separable convolutions,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, 2017, pp. 1251–1258. [Google Scholar]
  • [13].Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, Liu W, and Xiao B, “Deep High-Resolution Representation Learning for Visual Recognition,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 43, no. 10, pp. 3349–3364, Oct. 2021. [DOI] [PubMed] [Google Scholar]
  • [14].Chen L-C, Zhu Y, Papandreou G, Schroff F, and Adam H, “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation,” in Proc. European Conference on Computer Vision, 2018, pp. 801–818. [Google Scholar]
  • [15].Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, and Belongie S, “Feature Pyramid Networks for Object Detection,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, 2017, pp. 936–944. [Google Scholar]
  • [16].Ronneberger O, Fischer P, and Brox T, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Proc. International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015, pp. 234–241. [Google Scholar]
  • [17].Chaurasia A and Culurciello E, “LinkNet: Exploiting encoder representations for efficient semantic segmentation,” in IEEE Visual Communications and Image Processing, 2017, pp. 1–4. [Google Scholar]
  • [18].Simonyan K and Zisserman A, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Represent., 2015, pp. 1–14. [Google Scholar]
  • [19].Deng J, Dong W, Socher R, Li L-J, Li K, and Fei-Fei L, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, Jun. 2009, pp. 248–255. [Google Scholar]
  • [20].Wilcoxon F, “Individual comparisons by ranking methods,” Biometrics, no.1, vol. 6, pp. 80–83, 1945. [Google Scholar]

RESOURCES