Multimodal Continual Learning with Sonographer Eye-Tracking in Fetal Ultrasound

Arijit Patra; Yifan Cai; Pierre Chatelain; Harshita Sharma; Lior Drukker; Aris T Papageorghiou; J Alison Noble

doi:10.1007/978-3-030-87583-1_2

. Author manuscript; available in PMC: 2022 Sep 21.

Published in final edited form as: Simpl Med Ultrasound (2021). 2021 Sep 21;12967:14–24. doi: 10.1007/978-3-030-87583-1_2

Multimodal Continual Learning with Sonographer Eye-Tracking in Fetal Ultrasound

Arijit Patra ^1,^✉, Yifan Cai ¹, Pierre Chatelain ¹, Harshita Sharma ¹, Lior Drukker ¹, Aris T Papageorghiou ¹, J Alison Noble ¹

PMCID: PMC7612563 EMSID: EMS141981 PMID: 35368448

Abstract

Deep networks have been shown to achieve impressive accuracy for some medical image analysis tasks where large datasets and annotations are available. However, tasks involving learning over new sets of classes arriving over extended time is a different and difficult challenge due to the tendency of reduction in performance over old classes while adapting to new ones. Controlling such a ‘forgetting’ is vital for deployed algorithms to evolve with new arrivals of data incrementally. Usually, incremental learning approaches rely on expert knowledge in the form of manual annotations or active feedback. In this paper, we explore the role that other forms of expert knowledge might play in making deep networks in medical image analysis immune to forgetting over extended time. We introduce a novel framework for mitigation of this forgetting effect in deep networks considering the case of combining ultrasound video with point-of-gaze tracked for expert sonographers during model training. This is used along with a novel weighted distillation strategy to reduce the propagation of effects due to class imbalance.

Keywords: Incremental learning, Eye tracking, Fetal ultrasound

1. Introduction

Deep networks often need large quantities of labeled data [16,21,22]. In medical imaging, large datasets may not always be available but collected over time [19,26]. Retention of past data over extended time is often difficult in medical imaging compared to natural images and similar datasets due to privacy concerns, statutory limitations on storage duration and memory constraints particular to clinical situations in different countries. Evolving diseases or diagnostic regulations may require models to adapt to new data classes arriving over time. This requires models to learn incrementally without declining in performance on prior tasks trained for. Deep networks in medical imaging have often tried to adapt to new tasks over time using transfer learning [2]. Recent work has shown that transfer learning, despite leveraging past learning, doesn’t allow an effective balance between past learnt representations and current task knowledge

owing to catastrophic forgetting [7], wherein neural network based models are seen to show a decreased performance on initially trained tasks when retrained on new tasks without access to initial task data. This requires regularization of the learning on current tasks using prior task knowledge. This has recently been introduced as the continual learning paradigm in medical image analysis. Human sonographers acquire knowledge over time without losing performance on previously learnt modules. Can sonographers’ insights be used to improve knowledge transfer across sonography plane-finding tasks? This is explored with a novel multimodal incremental learning approach using class-weighted distillation. The question of multimodal information preservation in incremental adaptation remains unexplored in machine learning and medical imaging literature.

Recent work has attempted to reduce forgetting with replay on stored examples [14,25], expanding parameters [31], generative models [10,24] and weight regularization [13]. Besides model compression, knowledge distillation [8] was used in continual learning as the distillation loss is suitable for using a snapshot of learnt knowledge in a model at a particular learning step towards regularizing future learning. Recent methods using distillation for continual learning include Learning without Forgetting (LwF) [14], iCaRL [30] which incrementally performs representation learning, progressive distillation and retrospection (PDR) [9] and Learning without Memorizing (LwM) [4] where distillation is used with class activation. In medical imaging, real-world cases of data arriving over time has prompted research on continual pipelines such as MRI segmentation with pixel-level regularization [20], hierarchical learning in echocardiography [27], progressive modelling of Alzheimer’s [32], consolidated distillation [11], distillation and ensembling [15] and privacy preserved learning [29]. Compared to these, our methods do not require retention of exemplars for past classes and implement a performance driven weighting for distilled representations. Usage of expert knowledge or other forms of multimodal input for incremental learning remains unexplored in medical imaging to our knowledge. Dedicated metrics for incremental learning [5] were proposed but [5] like most prior work, use accuracy-derived metrics for assessment. Kim et al. [11] compute AUROC scores to assess overall performances after all learning sessions but do not define forgetting by effects on 1-vs-all AUC in multiclass incremental settings. Sonographer eye-tracking was used for biometry plane localization [1], representation learning [6] and model compression [23]. Different from [23], we learn representations for incrementally learning new classes with the same trained model.

Contributions

We propose a framework that a) demonstrates the first usage of multimodal data of ultrasound frames along with expert gaze in incremental learning b) a novel weighted distillation strategy to reduce the impact of task-specific class imbalance over incremental learning c) proposes metrics for assessment of incremental learning using both accuracy and AUC measures. We achieve superior incremental learning performances in terms of mitigation of forgetting without storing any of the past data. The AUC is computed as area under precision-recall (PR) curves per class in a 1-vs-all setting. An alteration in the number of false positives, as possible due to catastrophic forgetting, may still cause a small change to false positive rate (used in ROC estimation) [3]. In PR curves, precision is obtained by comparing false positives to true positives, capturing effects of large numbers of false positives on incremental performance.

2. Methods

Incremental Learning

Without losing generality, let the t^th stage in an M-stage incremental learning problem be a K-class classification task with a class set $X_{t} = {X_{t, i}}_{i = 1}^{K_{t}}, t \in {[| 1, M |}_{]}$ , where each X represents one class and training samples for stage t are drawn from the set: x_t, _train ∈ X _t. The objective is to x_test ∈ U _j=1 X _j recursively enable the classifier trained in stage t − 1, after completing the t^th stage incremental learning pipeline, perform inference on examples t without declining performance. This is non-trivial if training data from previous stages are unavailable, making it impossible to jointly retrain the classifier on the entire dataset. The challenge is to ensure specific interventions are adopted to reduce catastrophic forgetting [7]. Here, M = 2 and K ₁ = K ₂ = 3. We consider a two-stage multitask incremental learning problem with Stage 1 aiming to learn a classifier for fetal biometry planes (head, femur, abdomen), while Stage 2 aims to expand the Stage 1 classifier for echocardiogrpahy tasks, i.e. identification of three fetal cardiac standard planes: frames of four chamber (4CH), three vessel (3VV) and left ventricular outflow tract (LVOT). We show that the final classifier doesn’t suffer from forgetting for Stage 1 tasks without having to retrain using Stage 1 data. The output of the Stage 1 classifier f ₁(·) can take the form p = softmax(z) ∈ R^K1, where z = f ₁(x _1,train) ∈ R^K1 is the raw output of the last layer, or logits. The logits output is retained as old knowledge-based priors (similar to distillation [8]), and used in the 2^nd-stage training as regularization for a new task of classifying fetal echocardiography standard planes.

Gaze Acquisition

For including multimodal inputs in incremental learning, we consider a paired input tuple (I,G) of images and corresponding gaze maps, with their associated labels for the initial training. To mimic situations where a gaze map G will not be available during future task training, subsequent stages do not assume presence of the gaze map and can function with the gaze map substituted with another modality or a redundant copy of the image. Similar to the protocol in [23], the expert visual attention is captured by a gaze map G per image I. The point-of-gaze of the expert is tracked when looking at I. The initial task session involves a classification task using a tuple of I and G as input. In subsequent incremental tasks, we allow for situations where models may have access to only the image frames of the new classes in line with the difficulty of acquiring gaze maps in deployment environments. Initial inclusion of gaze maps in the learning process can improve the representation learning of the base model. This improved representation better protect against forgetting of the base tasks when adapting to novel tasks.

Weighted Distillation

The result of the first stage classifier using a crossentropy loss, as defined in the section before, is p = softmax(z) ∈ R^K
₁, where z is the set of logits. The classification loss in the first training stage is defined as:

L_{C} (y, p) = - {\sum^{​}}_{i = 1}^{K_{1}} y_{i} \cdot \log (p_{i})

(1)

Where p_i is the predicted probability scores for each of the classes in the new task, y_i the corresponding ground truth in a one-hot encoding form. In subsequent sessions, a knowledge distillation term is used in the objective, to allow inclusion of past knowledge in the optimization process (ˆy are the final layer class probability scores for the new task classes prior to the softmax operation):

L_{D} (z_{old}, \hat{y}) = - \sum_{i = 1}^{N} s o f t m a x (\frac{z_{old}}{T}) \cdot \log (s o f t m a x (\frac{{\hat{y}}_{i}}{T}))

(2)

The logits and predictions are softened in a distillation setting with a temperature term T. Softening helps create a smoother transition between the probability scores in the logits set as after a cross-entropy based optimization. This is addressed by spreading out the probability distribution scores with the temperature term. Here, z_old represents the logits from the past task, with class-specific average logits computed to obtain a sum of class-weighted logits as:

z_{o l d} = \sum_{i = 1}^{K_{1}} w_{i} z_{i}

(3)

The logits of individual classes z_i , i ∈ [|1,K ₁|] are obtained by averaging presoftmax scores (with sigmoid activation) for exemplars from the K ₁ classes. The summation weights (w ₁ ,w ₂ ,…,w_k ₁) are calculated as inverse of class-specific AUC on validation data of Stage 1 classes. This boosts the importance of logits from a more difficult or underrepresented class (lower the class AUC, higher the class weight) in its contribution to the overall representation to be retained for Stage 1 learning. An initial imbalance of classes is propagated upon a distillation based regularization for old classes, exacerbating the overall imbalance when not retaining Stage 1 exemplars for mixing with Stage 2 data. Current methods to mitigate the influence of imbalanced classes like augmentation, weighted crossentropy etc. are designed for cases when all classes to be learnt have data available unlike in distillation when samples from initial classes are not retained for incremental training. Then the overall objective is (λ set at 0.5 by grid search):

L = λ L_{C} + (1 - λ) L_{D}

(4)

2.1. Training

The Stage 1 task uses data from [28]. Fetal ultrasound videos were acquired with a simultaneous recording of sonographer eye-tracking data. Recording and storage was in compliance with local data governance policies. This data comprised 23016 abdomen, 24508 head, 12839 femur frames. A Tobii Eye Tracker 4C (Tobii, Sweden) recorded point-of-gaze as relative (x,y) coordinates with timestamps 90Hz capturing 3 gaze points per frame. Gaze points that were less than 0.5° apart were combined to a single fixation point. A sonographer visual attention map was generated for each frame using a truncated Gaussian with width equivalent to visual angle of 0.5° around the fixation point. In the incremental session, the data comprised fetal cardiac viewing planes with 9386 frames from the 4CH, 6780 frames of LVOT and 6210 of 3VV views. So we use two datasets D1 and D2 in initial (Stage 1) and incremental session (Stage 2) respectively.

Model

We design the twin input base model to have parallel convolutional processing strands through the course of residual blocks, to allow for independent convolutional operations on the image and the associated gaze map. The strands are derived from a ResNet-50 architecture (ablations with other backbones in Fig.2) and configured to accept grayscale inputs. After the final residual blocks, flattened feature maps of both the strands are concatenated. The inclusion of gaze maps follows the protocol in [23]. The fused layer feeds into a fully-connected layer of 512 units, followed by a pre-softmax layer with a sigmoid activation to obtain classwise probabilities. It is a departure from standard ResNet models where an average pooling layer succeeds the final residual blocks and feeds to a dense layer of dimensions equal to the number of classes. This allows to aggregate features and consider the impact of gaze incorporation on performance.

Fig. 2 — The variation of *AccDiff* and *AUCdiff* metrics when different backbones are used for learning the fused representations. ResNet50 based pipelines show the least *AccDiff* and *AUCdiff* consistently across studied approaches and adapted baselines.

Training

For both stages and their datasets, data augmentation was performed with a 20° rotational augmentation and horizontal flips for the image and gaze map frames [17]. Ultrasound frames and associated gaze maps were resized to 224 × 224. Models were trained with a subject-wise 80:20 split (71 subjects for train and 18 for test) of the dataset. Stage 1 models were trained for 200 epochs with learning rate of 0.001 and adaptive moment estimation (Adam) [12]. Stage 2 models were trained for 200 epochs over the (N, label, logit) set for all N frames passed to the trained model. The softening temperature was set at 3.0 after a grid search in T ∈ [|1,5|]. The study is labeled as Gaze Dist (wt) when weighted distillation is used. If distillation is used without weights determined by initial task results, it is Gaze Dist else it is No Gaze Dist. Here, incremental task still uses with weighted distillation. For transfer learning benchmarks, the case with gaze maps available is labeled FT (gaze), and FT (no gaze) otherwise. The incremental stage is designed to be able to accept both the image-level inputs as frames alone to make it resilient to cases of unavailable or corrupt gaze data. Our models use 48.7 million parameters, with average training time per epoch of 149 s on a cluster of 2 24 GB Nvidia K80 GPUs. Gaze processing was done on a desktop computer with a 3.1GHz Intel Core i7 8th generation processor with 512 MB RAM. Models were implemented in Tensorflow 2.0 with eager execution.

3. Results and Discussion

Metrics

We report classwise and average accuracies for Stage 1 and 2 in Table1, and AUCs in Table2 for initial classes. AUCs are reported per class as 1-vs-all values and averaged for each stage. To track forgetting, differences in average 1vs-all AUC values over incremental stages are reported for old classes as AUCDiff, along with fall in accuracy (AccDiff ). Adaptation to new tasks is directly noted by accuracy and AUC values on Stage 2 classes (Table3), as this stage builds on Stage 1 learning, with both forgetting and transfer effects. Forward transfer effects are implicit in the new task accuracies and AUC metrics. The relative decline in accuracy and AUC metrics for the initial task, across Stages 1 and 2, encodes combined forgetting and backward transfer effects.

Table 1. Stage 1 and Stage 2 class-specific accuracies for initial task classes, and averages, AccDiff quantifies the average drop in accuracy for the old classes.

	Stage 1				Stage 2				AccDiff
	AC	HC	FL	Avg	AC	HC	FL	Avg	Δ Avg
Gaze Dist (wt)	0.91	0.88	0.87	0.89	0.86	0.85	0.83	0.85	0.04
No Gaze Dist	0.73	0.76	0.74	0.74	0.65	0.63	0.62	0.63	0.11
Gaze Dist	0.91	0.88	0.87	0.89	0.83	0.80	0.79	0.81	0.08
FT (gaze)	0.91	0.88	0.87	0.89	0.63	0.54	0.56	0.58	0.31
FT (no gaze)	0.73	0.76	0.74	0.74	0.55	0.49	0.47	0.50	0.24
Lwf/ewc	0.74	0.76	0.74	0.75	0.60	0.61	0.59	0.60	0.15
LwM	0.73	0.76	0.75	0.75	0.64	0.62	0.61	0.62	0.12
PDR	0.73	0.76	0.75	0.75	0.61	0.58	0.57	0.59	0.16
DDE	0.73	0.76	0.75	0.75	0.66	0.63	0.65	0.65	0.10

Open in a new tab

Table 2. Stage 1 and Stage 2 class-specific 1-vs-all AUC, and averages, AUCdiff quantifies the average drop in 1-vs-all AUC for the old classes.

	Stage 1				Stage 2				AccDiff
	AC	HC	FL	Avg	AC	HC	FL	Avg	ΔAvg
Gaze Dist (wt)	0.96	0.93	0.89	0.93	0.91	0.85	0.83	0.86	0.06
No Gaze Dist	0.75	0.77	0.72	0.75	0.63	0.62	0.61	0.62	0.13
Gaze Dist	0.9	0.89	0.86	0.88	0.81	0.78	0.79	0.79	0.09
FT (gaze)	0.9	0.89	0.86	0.88	0.65	0.6	0.58	0.61	0.27
FT (no gaze)	0.75	0.77	0.72	0.75	0.57	0.52	0.51	0.53	0.21
Lwf/ewc	0.75	0.79	0.73	0.76	0.61	0.63	0.60	0.61	0.15
LwM	0.75	0.79	0.73	0.76	0.64	0.61	0.61	0.62	0.14
PDR	0.75	0.79	0.73	0.76	0.63	0.65	0.62	0.63	0.13
DDE	0.75	0.79	0.73	0.76	0.68	0.67	0.64	0.66	0.10

Open in a new tab

Table 3. Stage 2 class-specific accuracies and 1-vs-all AUC for new task classes (the avg accuracy and AUC are a proxy for adaptation to the new task).

	Accuracy				1-vs-all AUC
	4C	3V	LVOT	Avg	4C	3V	LVOT	Avg
Gaze Dist (wt)	0.89	0.8	0.82	0.84	0.92	0.81	0.76	0.83
No Gaze Dist	0.85	0.77	0.74	0.79	0.87	0.76	0.73	0.79
Gaze Dist	0.87	0.78	0.8	0.82	0.85	0.77	0.82	0.81
FT (gaze)	0.86	0.75	0.78	0.80	0.87	0.73	0.79	0.80
FT (no gaze)	0.83	0.72	0.73	0.76	0.82	0.75	0.71	0.76
Lwf/ewc	0.80	0.67	0.66	0.71	0.80	0.69	0.65	0.71
LwM	0.81	0.70	0.68	0.73	0.80	0.73	0.71	0.75
PDR	0.80	0.69	0.67	0.72	0.81	0.72	0.69	0.74
DDE	0.82	0.71	0.67	0.73	0.80	0.70	0.73	0.74

Open in a new tab

Results

Classwise performance for the old task is reported for the initial and the incremental sessions in terms of accuracy and 1-vs-all AUC. The change in performance is the difference in these values across sessions. Overall accuracies and averaged AUC are reported for all classes seen until a given session to capture overall model performance (Table1). Effects of gaze in reducing forgetting are evident as the difference in class-specific AUC is reduced compared to cases not using gaze maps as additional input. We do not retain past exemplars in memory for continual learning and incremental regularization is solely by saved logits from Stage 1 when optimizing for Stage 2 classes. Unlike existing approaches selectively retaining past data, we prioritize reduction in memory footprints while attaining superior continual learning performance using expert insights.

Discussion

Inclusion of gaze implies additional parameters and an incremental computational budget compared to off-the-shelf baselines. For fair comparison, we modified baseline to keep the number of parameters in the same order of magnitude as our proposed multimodal pipeline: 1) For baselines that do not use gaze maps, a parallel set of convolutional layers accepts the image as a redundant input, so the computational budget is comparable to a gaze based approach; 2) For external baselines from literature, we modified baseline representation learning stages to have parallel strands of convolutional layers as before, and enabled them to accept paired inputs, extending for comparison to cases of paired images and gaze maps used as inputs. We choose baselines suitable for contextualizing both weighted distillation and human knowledge inclusion. Comparisons are performed with adapted versions of these methods− LwF.EWC [11] proposed for X-ray incremental learning, distillation and retrospection (PDR) [9], dual distillation and ensembling (DDE) [15] and Learning without Memorizing (LwM) [4] (LwM does not retain exemplars). Methods using distillation outperform transfer learning baselines in terms of knowledge retention evident from higher Stage 2 accuracy and 1-vs-all AUC values. Gaze-based models perform better on Stage 2 metrics as well. Some methods like LwF.MC [14] and iCaRL [30] are equivalent to studies without gaze maps (‘No Gaze Dist’ in Tables) and are not separately benchmarked. Superior results for gaze-driven methods show that additional modalities enable deep networks to learn better input representations. Softening partly smoothens incorrect labels in input spaces for old tasks, reducing forward propagation of inaccuracies. Gains for complex classes such as fetal head frames are notable when using gaze. Unweighted distillation with gaze scales poorly compared to weighted distillation, underlining the role of more complex classes as forgetting is more prominent for a class with intraclass variations or difficult examples due to artefacts like shadows, speckle etc. [18,27]. An weighting strategy informed by initial performance metrics is seen to reduce forgetting here. Specific to accuracies and AUCs for finetuning baselines, gaze inclusion seems to cause slightly higher forgetting than otherwise. This is potentially due to finetuning being carried out across the parameter space and multimodal data causing stronger representation learning in old and new tasks, leading to greater shifts in magnitudes of parameters if no efforts are made to reduce forgetting. Ablations with different CNN backbones (Fig.2) show that ResNet-50 based models outperform other backbones.

4. Conclusion

We proposed a multimodal pipeline for incremental learning in ultrasound imaging using sonographer eye-tracking data. The inclusion of gaze priors reduced forgetting and enabled performance gains over state-of-the-art methods without requiring retention of past tasks’ data. Further, we developed an weighted logits approach for regularization of future task learning, and conceptualized new metrics to assess forgetting and new task adaptation.

References

1.Cai Y, Sharma H, Chatelain P, Noble JA. Sonoeyenet: Standardized fetalultrasound plane detection informed by eye tracking; 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018); 2018. pp. 1475–1478. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Chen H, Ni D, Yang X, Li S, Heng PA. In: MLMI 2014 LNCS. Wu G, Zhang D, Zhou L, editors. Vol. 8679. Springer; Cham: 2014. Fetal abdominal standard planelocalization through representation learning with knowledge transfer; pp. 125–132. [DOI] [Google Scholar]
3.Davis J, Goadrich M. The relationship between precision-recall and roc curves; Proceedings of the 23rd International Conference on Machine Learning; New York, NY, USA. 2006. pp. 233–240. [DOI] [Google Scholar]
4.Dhar P, Singh RV, Peng KC, Wu Z, Chellappa R. Learning without memorizing; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2019. pp. 5138–5146. [Google Scholar]
5.Díaz-Rodríguez N, Lomonaco V, Filliat D, Maltoni D. Don’t forget, there ismore than forgetting: new metrics for continual learning. arXiv: 1810.13166. 2018 [Google Scholar]
6.Droste R, et al. In: IPMI 2019 LNCS. Chung ACS, Gee JC, Yushkevich PA, Bao S, editors. Vol. 11492. Springer; Cham: 2019. Ultrasound image representation learning by modeling sonographer visual attention; pp. 592–604. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Goodfellow IJ, Mirza M, Xiao D, Courville A, Bengio Y. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv: 1312.6211. 2013 [Google Scholar]
8.Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network; NIPS 2014 Deep Learning Workshop; 2014. [Google Scholar]
9.Hou S, Pan X, Change Loy C, Wang Z, Lin D. Lifelong learning via progressive distillation and retrospection. ECCV. 2018 [Google Scholar]
10.Kemker R, Kanan C. Fearnet: Brain-inspired model for incremental learning. arXiv: 1711.10563. 2017 [Google Scholar]
11.Kim H-E, Kim S, Lee J. In: MICCAI 2018 LNCS. Frangi AF, Schnabel JA, Davatzikos C, Alberola-López C, Fichtinger G, editors. Vol. 11070. Springer; Cham: 2018. Keep and learn: continual learning by constrainingthe latent space for knowledge preservation in neural networks; pp. 520–528. [DOI] [Google Scholar]
12.Kingma DP, Adam JB. A method for stochastic optimization. arXiv: 1412.6980 [Google Scholar]
13.Kirkpatrick J, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences. 2017;114(13):3521–3526. doi: 10.1073/pnas.1611835114. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Li Z, Hoiem D. Learning without forgetting. IEEE Trans Pattern Anal MachIntell. 2017;40(12):2935–2947. doi: 10.1109/TPAMI.2017.2773081. [DOI] [PubMed] [Google Scholar]
15.Li Z, Zhong C, Wang R, Zheng W-S. In: MICCAI 2020 LNCS. Martel AL, et al., editors. Vol. 12261. Springer; Cham: 2020. Continual learning of new diseases withdual distillation and ensemble strategy; pp. 169–178. [DOI] [Google Scholar]
16.Omar H, Patra A, Domingos J, Upton R, Leeson P, Noble J. Myocardialwall motion assessment in stress echocardiography by quantification of principal strain bulls eye maps: P299. Eur Heart J Cardiovascular Imaging. 2017;18 [Google Scholar]
17.Omar HA, Domingos JS, Patra A, Leeson P, Noble JA. Improving visualdetection of wall motion abnormality with echocardiographic image enhancing methods; 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2018. pp. 1128–1131. [DOI] [PubMed] [Google Scholar]
18.Omar HA, Domingos JS, Patra A, Upton R, Leeson P, Noble JA. Quantification of cardiac bull’s-eye map based on principal strain analysis for myocardial wall motion assessment in stress echocardiography; 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018); 2018. pp. 1195–1198. [Google Scholar]
19.Omar HA, Patra A, Domingos JS, Leeson P, Noblel AJ. Automatedmyocardial wall motion classification using handcrafted features vs a deep CNNbased mapping; 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2018. pp. 3140–3143. [DOI] [PubMed] [Google Scholar]
20.Ozdemir F, Fuernstahl P, Goksel O. In: MICCAI 2018 LNCS. Frangi AF, Schnabel JA, Davatzikos C, Alberola-López C, Fichtinger G, editors. Vol. 11073. Springer; Cham: 2018. Learn the new, keep the old: extendingpretrained models with new anatomy and images; pp. 361–369. [DOI] [Google Scholar]
21.Patra A, Huang W, Noble JA. In: DLMIA/MLCDS-2017 LNCS. Cardoso MJ, et al., editors. Vol. 10553. Springer; Cham: 2017. Learning spatio-temporal aggregation for fetalheart analysis in ultrasound video; pp. 276–284. [DOI] [Google Scholar]
22.Patra A, et al. Sequential anatomy localization in fetal echocardiography videos. arXiv preprint. 2018:arXiv:1810.11868 [Google Scholar]
23.Patra A, et al. In: MICCAI 2019 LNCS. Shen D, et al., editors. Vol. 11767. Springer; Cham: 2019. Efficient ultrasound image analysis models with sonographer gazeassisted distillation; pp. 394–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Patra A, Chakraborti T. Learn more, forget less: Cues from human brain; Proceedings of the Asian Conference on Computer Vision (ACCV); 2020. Nov, [Google Scholar]
25.Patra A, Noble JA. In: MIUA 2019 CCIS. Zheng Y, Williams BM, Chen K, editors. Vol. 1065. Springer; Cham: 2020. Incremental learning of fetal heart anatomies using interpretable saliency maps; pp. 129–141. [DOI] [Google Scholar]
26.Patra A, Noble JA. Multi-anatomy localization in fetal echocardiographyvideos; 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019); 2019. pp. 1761–1764. [Google Scholar]
27.Patra A, Noble JA. Hierarchical class incremental learning of anatomical structures in fetal echocardiography videos. IEEE J Biomed Health Informatics. 2020 doi: 10.1109/JBHI.2020.2973372. [DOI] [PubMed] [Google Scholar]
28.PULSE: Perception ultrasound by learning sonographic experience. 2018. www.eng.ox.ac.uk/pulse .
29.Ravishankar H, et al. Deep Learning and Data Labeling for Medical Applications. Springer; Cham: 2016. Understanding the mechanisms of deep transfer learning formedical images; pp. 188–196. [DOI] [Google Scholar]
30.Rebuffi SA, Kolesnikov A, Sperl G, Lampert CH. ICARL: Incremental classifier and representation learning. Proceedings of the IEEE CVPR. 2017:2001–2010. [Google Scholar]
31.Rusu AA, et al. Progressive neural networks. arXiv: 1606.04671. 2016 [Google Scholar]
32.Zhang J, Wang Y. In: MICCAI 2019 LNCS. Shen D, et al., editors. Vol. 11765. Springer; Cham: 2019. Continually modeling Alzheimer’s disease progression viadeep multi-order preserving weight consolidation; pp. 850–859. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Cai Y, Sharma H, Chatelain P, Noble JA. Sonoeyenet: Standardized fetalultrasound plane detection informed by eye tracking; 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018); 2018. pp. 1475–1478. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Chen H, Ni D, Yang X, Li S, Heng PA. In: MLMI 2014 LNCS. Wu G, Zhang D, Zhou L, editors. Vol. 8679. Springer; Cham: 2014. Fetal abdominal standard planelocalization through representation learning with knowledge transfer; pp. 125–132. [DOI] [Google Scholar]

[R3] 3.Davis J, Goadrich M. The relationship between precision-recall and roc curves; Proceedings of the 23rd International Conference on Machine Learning; New York, NY, USA. 2006. pp. 233–240. [DOI] [Google Scholar]

[R4] 4.Dhar P, Singh RV, Peng KC, Wu Z, Chellappa R. Learning without memorizing; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2019. pp. 5138–5146. [Google Scholar]

[R5] 5.Díaz-Rodríguez N, Lomonaco V, Filliat D, Maltoni D. Don’t forget, there ismore than forgetting: new metrics for continual learning. arXiv: 1810.13166. 2018 [Google Scholar]

[R6] 6.Droste R, et al. In: IPMI 2019 LNCS. Chung ACS, Gee JC, Yushkevich PA, Bao S, editors. Vol. 11492. Springer; Cham: 2019. Ultrasound image representation learning by modeling sonographer visual attention; pp. 592–604. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Goodfellow IJ, Mirza M, Xiao D, Courville A, Bengio Y. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv: 1312.6211. 2013 [Google Scholar]

[R8] 8.Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network; NIPS 2014 Deep Learning Workshop; 2014. [Google Scholar]

[R9] 9.Hou S, Pan X, Change Loy C, Wang Z, Lin D. Lifelong learning via progressive distillation and retrospection. ECCV. 2018 [Google Scholar]

[R10] 10.Kemker R, Kanan C. Fearnet: Brain-inspired model for incremental learning. arXiv: 1711.10563. 2017 [Google Scholar]

[R11] 11.Kim H-E, Kim S, Lee J. In: MICCAI 2018 LNCS. Frangi AF, Schnabel JA, Davatzikos C, Alberola-López C, Fichtinger G, editors. Vol. 11070. Springer; Cham: 2018. Keep and learn: continual learning by constrainingthe latent space for knowledge preservation in neural networks; pp. 520–528. [DOI] [Google Scholar]

[R12] 12.Kingma DP, Adam JB. A method for stochastic optimization. arXiv: 1412.6980 [Google Scholar]

[R13] 13.Kirkpatrick J, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences. 2017;114(13):3521–3526. doi: 10.1073/pnas.1611835114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Li Z, Hoiem D. Learning without forgetting. IEEE Trans Pattern Anal MachIntell. 2017;40(12):2935–2947. doi: 10.1109/TPAMI.2017.2773081. [DOI] [PubMed] [Google Scholar]

[R15] 15.Li Z, Zhong C, Wang R, Zheng W-S. In: MICCAI 2020 LNCS. Martel AL, et al., editors. Vol. 12261. Springer; Cham: 2020. Continual learning of new diseases withdual distillation and ensemble strategy; pp. 169–178. [DOI] [Google Scholar]

[R16] 16.Omar H, Patra A, Domingos J, Upton R, Leeson P, Noble J. Myocardialwall motion assessment in stress echocardiography by quantification of principal strain bulls eye maps: P299. Eur Heart J Cardiovascular Imaging. 2017;18 [Google Scholar]

[R17] 17.Omar HA, Domingos JS, Patra A, Leeson P, Noble JA. Improving visualdetection of wall motion abnormality with echocardiographic image enhancing methods; 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2018. pp. 1128–1131. [DOI] [PubMed] [Google Scholar]

[R18] 18.Omar HA, Domingos JS, Patra A, Upton R, Leeson P, Noble JA. Quantification of cardiac bull’s-eye map based on principal strain analysis for myocardial wall motion assessment in stress echocardiography; 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018); 2018. pp. 1195–1198. [Google Scholar]

[R19] 19.Omar HA, Patra A, Domingos JS, Leeson P, Noblel AJ. Automatedmyocardial wall motion classification using handcrafted features vs a deep CNNbased mapping; 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2018. pp. 3140–3143. [DOI] [PubMed] [Google Scholar]

[R20] 20.Ozdemir F, Fuernstahl P, Goksel O. In: MICCAI 2018 LNCS. Frangi AF, Schnabel JA, Davatzikos C, Alberola-López C, Fichtinger G, editors. Vol. 11073. Springer; Cham: 2018. Learn the new, keep the old: extendingpretrained models with new anatomy and images; pp. 361–369. [DOI] [Google Scholar]

[R21] 21.Patra A, Huang W, Noble JA. In: DLMIA/MLCDS-2017 LNCS. Cardoso MJ, et al., editors. Vol. 10553. Springer; Cham: 2017. Learning spatio-temporal aggregation for fetalheart analysis in ultrasound video; pp. 276–284. [DOI] [Google Scholar]

[R22] 22.Patra A, et al. Sequential anatomy localization in fetal echocardiography videos. arXiv preprint. 2018:arXiv:1810.11868 [Google Scholar]

[R23] 23.Patra A, et al. In: MICCAI 2019 LNCS. Shen D, et al., editors. Vol. 11767. Springer; Cham: 2019. Efficient ultrasound image analysis models with sonographer gazeassisted distillation; pp. 394–402. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Patra A, Chakraborti T. Learn more, forget less: Cues from human brain; Proceedings of the Asian Conference on Computer Vision (ACCV); 2020. Nov, [Google Scholar]

[R25] 25.Patra A, Noble JA. In: MIUA 2019 CCIS. Zheng Y, Williams BM, Chen K, editors. Vol. 1065. Springer; Cham: 2020. Incremental learning of fetal heart anatomies using interpretable saliency maps; pp. 129–141. [DOI] [Google Scholar]

[R26] 26.Patra A, Noble JA. Multi-anatomy localization in fetal echocardiographyvideos; 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019); 2019. pp. 1761–1764. [Google Scholar]

[R27] 27.Patra A, Noble JA. Hierarchical class incremental learning of anatomical structures in fetal echocardiography videos. IEEE J Biomed Health Informatics. 2020 doi: 10.1109/JBHI.2020.2973372. [DOI] [PubMed] [Google Scholar]

[R28] 28.PULSE: Perception ultrasound by learning sonographic experience. 2018. www.eng.ox.ac.uk/pulse .

[R29] 29.Ravishankar H, et al. Deep Learning and Data Labeling for Medical Applications. Springer; Cham: 2016. Understanding the mechanisms of deep transfer learning formedical images; pp. 188–196. [DOI] [Google Scholar]

[R30] 30.Rebuffi SA, Kolesnikov A, Sperl G, Lampert CH. ICARL: Incremental classifier and representation learning. Proceedings of the IEEE CVPR. 2017:2001–2010. [Google Scholar]

[R31] 31.Rusu AA, et al. Progressive neural networks. arXiv: 1606.04671. 2016 [Google Scholar]

[R32] 32.Zhang J, Wang Y. In: MICCAI 2019 LNCS. Shen D, et al., editors. Vol. 11765. Springer; Cham: 2019. Continually modeling Alzheimer’s disease progression viadeep multi-order preserving weight consolidation; pp. 850–859. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Multimodal Continual Learning with Sonographer Eye-Tracking in Fetal Ultrasound

Arijit Patra

Yifan Cai

Pierre Chatelain

Harshita Sharma

Lior Drukker

Aris T Papageorghiou

J Alison Noble

Abstract

1. Introduction

Contributions

2. Methods

Incremental Learning

Gaze Acquisition

Weighted Distillation

2.1. Training

Model

Fig. 2.

Training

3. Results and Discussion

Metrics

Table 1. Stage 1 and Stage 2 class-specific accuracies for initial task classes, and averages, AccDiff quantifies the average drop in accuracy for the old classes.

Table 2. Stage 1 and Stage 2 class-specific 1-vs-all AUC, and averages, AUCdiff quantifies the average drop in 1-vs-all AUC for the old classes.

Table 3. Stage 2 class-specific accuracies and 1-vs-all AUC for new task classes (the avg accuracy and AUC are a proxy for adaptation to the new task).

Results

Discussion

4. Conclusion

Fig. 1. The initially trained model (on Stage 1 tasks) is later trained for an incremental task at Stage t (here, t = 2), with cross-distillation using logits stored from initial stages.

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Multimodal Continual Learning with Sonographer Eye-Tracking in Fetal Ultrasound

Arijit Patra

Yifan Cai

Pierre Chatelain

Harshita Sharma

Lior Drukker

Aris T Papageorghiou

J Alison Noble

Abstract

1. Introduction

Contributions

2. Methods

Incremental Learning

Gaze Acquisition

Weighted Distillation

2.1. Training

Model

Fig. 2.

Training

3. Results and Discussion

Metrics

Table 1. Stage 1 and Stage 2 class-specific accuracies for initial task classes, and averages, AccDiff quantifies the average drop in accuracy for the old classes.

Table 2. Stage 1 and Stage 2 class-specific 1-vs-all AUC, and averages, AUCdiff quantifies the average drop in 1-vs-all AUC for the old classes.

Table 3. Stage 2 class-specific accuracies and 1-vs-all AUC for new task classes (the avg accuracy and AUC are a proxy for adaptation to the new task).

Results

Discussion

4. Conclusion

Fig. 1. The initially trained model (on Stage 1 tasks) is later trained for an incremental task at Stage t (here, t = 2), with cross-distillation using logits stored from initial stages.

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases