Abstract
Objective:
Spontaneous echo contrast (SEC) is a vascular ultrasound finding associated with increased thromboembolism risk. However, identification requires expert determination and clinician time to report. We developed a deep learning model that can automatically identify SEC. Our model can be applied retrospectively without deviating from routine clinical practice. The retrospective nature of our model means future works could scan archival data to opportunistically correlate SEC findings with documented clinical outcomes.
Methods:
We curated a data set of 801 archival acquisitions along the femoral vein from 201 patients. We used a multisequence convolutional neural network (CNN) with ResNetv2 backbone and visualized keyframe importance using soft attention. We evaluated SEC prediction performance using an 80/20 train/test split. We report receiver operating characteristic area under the curve (ROC-AUC), along with the Youden threshold-associated sensitivity, specificity, F1 score, true negative, false negative, false positive and true positive.
Results:
Using soft attention, we can identify SEC with an AUC of 0.74, sensitivity of 0.73 and specificity of 0.68. Without soft attention, our model achieves an AUC of 0.69, sensitivity of 0.71 and specificity of 0.60. Additionally, we provide attention visualizations and note that our model assigns higher attention score to ultrasound frames containing more vessel lumen.
Conclusion:
Our multisequence CNN model can identify the presence of SEC from ultrasound keyframes with an AUC of 0.74, which could enable screening applications and enable more SEC data discovery. The model does not require the expert intervention or additional clinician reporting time that are currently significant barriers to SEC adoption. Model and processed data sets are publicly available at https://github.com/Ouwen/automatic-spontaneous-echo-contrast.
Keywords: Deep learning, Spontaneous echo contrast, Point-of-care ultrasound, Femoral vein, COVID-19, Oncology, Deep vein thrombosis, Venous thromboembolism
Introduction
Spontaneous echo contrast (SEC) is a clinical finding that appears in ultrasound examinations and presents as echogenicity within normally anechoic blood. The presence of SEC is clinically associated with low-velocity blood flow, blood stasis and increased risk of thromboembolism events such as myocardial infarction, stroke, deep vein thrombosis (DVT) and pulmonary embolism (PE) [1−6]. Patient populations with elevated risk for thromboembolism include long-term stay hospitalized patients, trauma patients, pregnant women and patients with cancer receiving chemotherapy [7,8]. It is hypothesized that the underlying cause of SEC is red blood cell, platelet and leukocyte aggregates, which cause the backscatter echogenicity to be visible on ultrasound examination [9,10]. This aggregation, particularly the formation of Rouleaux structures at low shear rates, is a key factor in the echogenic appearance of SEC, as highlighted by Cloutier et al. [11]. We provide examples of a vein cross section presenting with and without SEC in Figure 1.
Figure 1.

Example of SEC+ and SEC− data. Left: Two sets of three keyframe vein sample sections with SEC absent (top) and present (bottom). We provide two examples of SEC+ in the POP and FV. We drew an arrow indicating where the SEC backscatter can be more clearly seen in the longitudinal lumen. SEC can be a subtle clinical finding, and labels can be noisy. SEC was determined by sonographers with 30+ y of experience. In the figure, SEC+/− corresponds to the presence and absence of SEC in the set. POP and FV correspond to the popliteal and femoral veins for the set. Trans and Long correspond to individual labels on frames for transverse and longitudinal views. Blue and red correspond to without compression and with compression. Green corresponds to compression not applicable. FV, femoral vein; POP, popliteal femoral vein; SEC, spontaneous echo contrast.
Spontaneous echo contrast findings have been clinically correlated with thromboembolism risk. COVID-19 is the latest addition to the list of clinical etiologies that elevate the risk for thromboembolism. A meta-analysis concluded that 22.7% of patients in the intensive care unit (ICU) for COVID-19 develop venous thromboembolism (VTE) [12]. Additionally, COVID-19 clotting risks can be elevated even 6 mo post-infection [13]. Predictably, SEC findings were recently reported in vascular point-of-care ultrasound studies for COVID-19-infected patients [14,15]. Lower-extremity SEC was present in 22 of 31 ICU patients with COVID-19 with anatomical locations in the femoral veins and the sapheno-femoral junction [15]. These results reinforce the existing link between SEC and thromboembolism.
Given the correlation between VTE and SEC, clinicians are interested in potentially using SEC as a prognostic marker for thromboembolic risk [16,17]. The presence of SEC may even suggest that earlier and/or higher-intensity thromboprophylaxis treatment is appropriate. However, to date, there is no large-scale, evidence-based study to warrant treatment change. Such a study would have high clinical impact given the many etiologies that cause thromboembolism.
There are many challenges to conducting such a large-scale study to map SEC to outcomes. One is that SEC identification requires a high level of expertise. For example, SEC observations were made by experienced point-of-care ultrasound trained imaging specialists with 5 to >30 y of experience [15,18]. With the appropriate training, one study reported that specialists can have an 89% agreement in the left atrium of the heart with increasing SEC identification difficulty for smaller anatomical vasculature [19]. Finally, SEC is not routinely reported outside of academic studies.
In the work described here, our goal was to automatically detect SEC using convolutional neural network (CNN) image classification methods. This model could reduce interpreter dependence and enable automatic background detection without deviation from clinical workflow. CNNs have been widely used in deep learning and have excelled at image classification, reaching and in some cases exceeding human-level performance [20]. Other approaches to more structured SEC identification include semiquantitative grading [21], but this approach still requires expert determination and time. Another approach integrates the backscatter present in the atrium lumen [22]; however, this approach requires a standardized view, calibration of gain, selection of a region of interest and a fixed set of frames to integrate. Our proposed CNN method is intended to run in the background, without disrupting existing clinical workflows. To this end, our training data are derived from archival data saved from routine ultrasound examinations and required no interventional change from the sonographer.
Methods
We curated an SEC-labeled, retrospective, vascular ultrasound imaging data set from archived DICOM (Digital Imaging and Communications in Medicine) images. In a previous study, ultrasound images were evaluated by two radiologists (V.A., a senior radiology resident, and C. J., a board-certified radiologist with an abdominal imaging fellowship). They confirmed the presence or absence of qualitative slow venous flow and the absence of initial or recent DVT. Discrepancies were resolved through consensus re-evaluation. The study defined slow venous flow without thrombus qualitatively as amorphous echogenicity in the fully compressible venous lumen, exceeding that of the adjacent artery. This definition provided a clear and replicable standard for identifying slow venous flow in ultrasound images. To further exclude the artifacts, labelers compared results with those for the adjacent artery during the study, and only reported SEC when the vein was more echogenic than the artery. The prior study noted that although cine imaging was frequently performed, it was not used as a standard because of its inconsistent availability across all cases. Similarly, our study did not rely on cine imaging for SEC identification, acknowledging the limitations in data availability [18].
Original point-of-care acquisitions were performed in routine vascular ultrasound examinations by an American College of Radiology (ACR)-accredited department from January 1, 2008, to January 1, 2015. This retrospective study was approved by the institutional review boards of both author institutions. Routine examinations were conducted as the participants were tertiary care oncology inpatients undergoing chemotherapy, a group recognized for their elevated risk of thromboembolism. Following the departmental protocol, the common femoral vein (CFV), femoral vein (FV) and popliteal femoral vein (POP) were imaged.
Most examinations were performed by Registry of Diagnostic Medical Sonography sonographers under the supervision of experienced sub-specialist radiologists. Either the Philips iU22 (Philips Healthcare, Bothell, WA, USA) or Logiq E9 (General Electric, Milwaukee, WI, USA) was used. Most examinations were performed using linear transducers, 9−12 MHz, but several examinations required curved array transducers, 2−5 MHz, to achieve better depth penetration and fields of view for limited sections of the femoral veins.
We provide an anatomical diagram of how keyframes are sampled from different sections in Figure 2A. Hereafter we refer to these as “sample sections.” We certify that this imaging data set has never been publicly released until now.
Figure 2.

Data pre-processing and distribution. (A) Anatomical region where sample sections are created. (B) Image processing of raw DICOM. (C) We show a count distribution of the number of keyframes for the left and right vein sections (CFV, FV, POP) where SEC is present (orange) and absent (blue). This illustrates that it would not be possible to trivially determine SEC from the number of keyframes saved. CFV, common femoral vein; FV, femoral vein; POP, popliteal femoral vein; SEC, spontaneous echo contrast.
Data set distribution
In clinical practice, SEC detection is often focused on specific vein sections where the phenomenon is likely to occur, rather than across an entire patient study. Moreover, this section-level analysis is essential for the nuanced understanding of SEC, allowing for precise interventions and patient monitoring. This localized approach is mirrored in our data set labeling and is reflected in our assumptions that the data are independent on the vein section levels. Although we recognize the value of patient-level analysis, our current data set reflects the clinical reality of SEC’s localized nature, and thus, our model is designed to align with the practice at which radiological assessments are typically performed.
Our data set contains 1074 ground-truth sample sections from 201 patients, which come from six independently imaged sections: left/right femoral veins, left/right common femoral veins and left/right popliteal femoral veins. In the SEC classification task, 801 vein sections from 177 patients are SEC labeled and used for the training. Some patients and veins were excluded from the SEC task because of missing data and unclear labels, but they were used in other training tasks such as vein compression, scan view and vein section classification. Each sample section was given a label for SEC as “present” or “absent.” Our data set is relatively balanced; 484 sample sections have SEC present, and 317 sample sections have SEC absent.
Each sample section can range from 1 to 17 sonographer-selected frames (keyframes), yielding a total of 6568 keyframes. The keyframes between present and absent SEC sample sections were similarly distributed. Thus, the count of keyframes in a sample section is not enough to determine SEC presence and cause ground-truth leakage. We include histogram distributions of keyframe counts in the middle section of Figure 1. We performed an 80%/20%, train/test split on our 801-sample vein section data set.
In Table S1 (online only), we outline the distribution of SEC across vein sections at the patient level. Given the qualitative variability and disagreement among all sections, we calculated the intra-patient pairwise Cohen’s κ and found strong to moderate disagreement in correlation, with an average index of 0.26. Additionally, because of the transient nature of SEC signals, we divided the data into training and testing sets and conducted the study at the vein section level rather than at the patient level. To mitigate potential bias, we also implemented a patient-level split and assessed our model on a subset comprising 22.6% of the data, which included 40 patients and 194 vein sections (110 SEC present, 84 SEC absent). The specifics of this evaluation are detailed in Tables S2 and S3 (online only) to validate the concept that SEC signals are independent of vein section levels. For more balanced distributions in our data set and labels, we proceed with the experiments with section-level splits.
Data set pre-processing
There are many pre-processing steps required before the raw ultrasound DICOM images can be used for a deep learning application. Embedded annotation text in the DICOM images, including patient identifiers, anatomic location, view orientation and performance of vein compression, was removed from the images and captured as metadata before training the CNN model. Ultrasound DICOM images commonly have burned in text, which should not be exposed to a CNN-based model. Additionally, an exported DICOM may contain multiple ultrasound acquisitions in 2 × 1 or 4 × 1 layouts. For each DICOM image, we cropped an N × M layout into their individual keyframes. We provide a sample of the original DICOM in Figure 2B.
Each grayscale keyframe was downsampled by 2 in each dimension, and we retained the original aspect as well as pixel ratios with an average varying resolution of 289 × 209. One of the computational innovations of this work is that our model can train efficiently on different-sized images using ragged tensors. Our data input is (N = 16, K, H/2, W/2, C = 3), where N is batch size, K is a variable number of frames, H is height, W is width and C is the number of channels, which typically correspond to an RGB image. As we have a gray-scale image, we simply replicate the gray channel to have three channels. This step is needed to re-use weights trained on the ImageNet format [23]. We use the following data augmentation scheme to improve model performance: (i) horizontal and vertical flipping, (ii) random 90° rotations, (iii) ±15% variations in brightness and (iv) ±15% variations in contrast, simulating the diverse conditions under which clinical ultrasound screenings are performed (such augmentations prepare the model to recognize SEC under varied imaging orientations, angles, perspectives and lighting conditions, reflecting the real-world variability in patient positioning and ultrasound machine settings; from a machine learning perspective, these augmentations reduce overfitting by introducing a wider range of training scenarios); and (v) normalizing the grayscale image dynamic range to between −1 and 1 (as a standard pre-processing step, it aids in neural network optimization, ensuring consistent image input scales for the model’s learning process).
Model architecture and training
ResNet50v2 is a very common CNN architecture for performing the image classification task [24]. We use this model pre-trained on ImageNet as a base model to perform feature extraction of our ultrasound keyframes. Because the clinical determination for SEC occurs over a set of keyframes, our data input must be a sample section, which is K keyframes. We model this problem using a multisequence CNN whose dimensional input is K × 224 × 224 × 3, and dimensional output is a single prediction for SEC.
One of the technical challenges is that our data are labeled on a set of keyframes, not individual keyframes. In machine learning literature, this is referred to as a multiple instance learning (MIL)-based problem [25]. In the MIL problem formulation, there are instances {x1, …, xK}, which make up a set X. In an ideal world, labels would exist at the instance level {x1, …, xK}; however, it is usually not feasible or cost-effective to have instance-level labeling. Instead, a single label is assigned to the bag, or set, X. In our case, the set of keyframes is the bag, and individual frames are our instances.
One required attribute to modeling MIL is permutation invariance. A bag X should return the same prediction regardless of the internal ordering. In previous work, deep learning has been applied to image-based MIL problems [26]. For each image instance {x1, …, xK}, an M-dimensional feature vector is calculated {h1, …, hK}. To achieve permutation invariance, a differentiable pooling operation such as max or mean pooling [27] is performed across instances of the bag. This collapses our bag of K instances into a single M-dimensional feature vector for subsequent classification.
To generate feature vectors, ResNet can take an H × W × 3 image as input and produce a 2048 × 1 feature vector. We do this for each keyframe, producing K × 2048 × 1 feature vectors. We then perform mean pooling of these features to create a single 2048 × 1 dimensional feature vector to represent our sample section. This pooled feature vector is fed into fully connected layers for prediction with sigmoidal activation. We provide an architectural diagram in Figure 3.
Figure 3.

Model architecture. Our model architecture takes N × H × W × 3 sample section inputs to produce a probability for SEC presence. A ResNet50v2 model performs feature extraction on each of the N keyframes to produce N × 2048 × 1 feature vectors. These vectors are mean pooled into a single 2048 × 1 feature vector for fully connected layers to perform SEC classification. We incorporate attention by broadcasting an N × 1 × 1 attention scalar to our N × 2048 × 1 feature vector before mean pooling. On the left is our approach for single frames, in the middle is the embedding approach and on the right is the attention approach. SEC, spontaneous echo contrast.
We use our model architecture to identify four clinical findings (Fig. 1): (i) presence of compression, this is typically performed by the sonographer to compress a vein and rule out DVT; (ii) transverse versus longitudinal, this is relative to the orientation to the vasculature; (iii) anatomical location, our data include keyframes from the FV, CFV and popliteal veins (we group FV and CFV together as one class to make this a binary classification task); and (iv) automatic detection of SEC. Compression and transverse-versus-longitudinal tasks are labeled at the individual keyframe-instance level, while anatomical location and SEC tasks are labeled at the group-bag level. In the single-frame cases, our model reduces to a simple ResNet50v2. In the multiframe or bag case, we take the mean pool of feature vectors. These models can serve future ultrasound image pre-processing application purposes. We expect high model performance on compression identification, orientation identification and view identification as these involve recognition of obvious anatomical findings.
In training for all tasks, an ADAM optimizer [28] and a batch size as high as GPU memory allows were used. On the test set, we report the receiver operating characteristic area under the curve (ROC AUC), sensitivity, specificity and F1 score, and provide a confusion matrix as our main metric for model performance [29]. These results are provided in Table 1 and Figure 4.
Table 1.
Metrics for different binary classification
| AUC | Threshold | Sensitivity | Specificity | F1 score | TN | FP | FN | TP | |
|---|---|---|---|---|---|---|---|---|---|
| Transverse vs. longitudinal | 0.9949 | 0.8605 | 0.9613 | 0.9886 | 0.9690 | 865 | 10 | 17 | 422 |
| With/without compression | 0.9568 | 0.6509 | 0.8961 | 0.9069 | 0.8961 | 419 | 43 | 43 | 371 |
| Anatomical view (FV + CFV vs. POP) | 0.9751 | 0.6040 | 0.9194 | 0.9344 | 0.8702 | 171 | 12 | 5 | 57 |
| SEC MIL embedding | 0.6927 | 0.2503 | 0.7113 | 0.6032 | 0.7225 | 38 | 25 | 28 | 69 |
| SEC MIL attention | 0.7356 | 0.3255 | 0.7320 | 0.6825 | 0.7553 | 43 | 20 | 26 | 71 |
| SEC MIL gated attention | 0.7174 | 0.9205 | 0.7320 | 0.5397 | 0.7208 | 34 | 29 | 26 | 71 |
AUC, area under the receiver operating characteristic curve; CFV, common femoral vein; FN, false negative; FP, false positive; FV, femoral vein; MIL, multiple instance learning; POP, popliteal femoral vein; SEC, spontaneous echo contrast; TN, true negative; TP, true positive.
Figure 4.

Area under the receiver operating characteristic (AUC ROC) curve of model performance on identifying spontaneous echo contrast using mean pool embedding and attention-weighted and gated attention-weighted approaches from a set of ultrasound keyframes.
Attention-based visualization
One challenge to our model is that each sample section contains multiple keyframes. However, as SEC is an amorphous, transient finding, not every keyframe will have obvious SEC findings. Ideally, certain frames should be given more importance than others.
To better model this reality, we use the method of soft attention, which scales feature vectors by a calculated “attention” or “importance” weight [26,30]. In some use cases, attention can improve model performance [31]. In our use case, soft attention can be additionally used as a visualization tool to examine which keyframes were given higher importance. Recall that mean pooling is represented by the equation
where K is the size of a bag and hk is the feature vector for xk. If we are able to calculate an attention scalar ak for each hk feature vector, we can turn this equation into
Note that in the case of uniform attention, this simplifies to the previous mean pooling equation. We can calculate ak as
where V and wT respectively correspond to a fully connected layer with tanh activation and a 1-D fully connected layer. The softmax enforces that the attention sums to 1. Another concept we can incorporate is information gating [32]. This can be used to scale up or down features in the feature vector. We can add another learnable weight matrix U to construct the equation
For our ultrasound keyframes, each H × W × 3 image maps to a 2048 × 1 feature vector. We use our 2048 × 1 feature vector, but instead of using fully connected layers to predict SEC directly, we predict an unbounded attention value. For K images, we have K feature vectors and K attention values. We use a softmax activation so that the sum of K attention values is 1. These softmax activated attention values are then multiplied by their corresponding feature vectors. The case of full attention would be represented by a scalar value near 1, and feature values would remain unchanged. The case of no attention would have attention near 0, and the signal would be destroyed. We use these newly scaled feature vectors to predict SEC, as we did without an attention module.
Results
The ROC curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied, and the AUC is a measure of the classifier’s ability to differentiate between classes—in this case, the presence or absence of SEC—which is essential for reliable medical diagnosis. In the subclassification tasks, as detailed in Table 1, we observe high AUC values, such as 0.9949 for the transverse-versus-longitudinal task, 0.9568 for with/without compression and 0.9751 for the anatomical view (FV + CFV vs. POP) task. These high scores fall into the outstanding discrimination category, indicating that our model exhibits a high degree of accuracy in differentiating between the compared classes. These metrics confirm our model is properly implemented and can successfully identify obvious, high-signal imaging findings. These findings may be useful as utility tools in the SEC screening as locating the vein sections, conducting the compressive test and collecting ultrasound along the intended perspectives are essential parts of the screening protocol. We also provide these models as a library for future use in ultrasound image pre-processing use cases.
Conversely, for the multiframe SEC detection tasks, the AUC values are 0.69 for the MIL embedding, 0.74 for the MIL attention and 0.72 for the MIL gated attention models. The values suggest an acceptable discrimination. Although they do not reach an excellent threshold as in the previous tasks, it is important to recognize the inherent difficulty in detecting subtle features such as SEC, which can be easily confounded by noise and artifacts in the imaging data. Thus, these AUC values still reflect a meaningful capability of the models to discern SEC features to a useful degree, albeit with room for improvement.
In the context of clinical applications, sensitivity and specificity are crucial. Sensitivity, or the true positive rate, measures the model’s ability to correctly identify cases with SEC, which is vital to ensure patients at risk are not overlooked. Specificity, or the true negative rate, assesses how effectively the model recognizes cases without SEC, which is critical for minimizing unnecessary interventions. For the tasks transverse-versus-longitudinal, with/without compression, anatomical view (FV + CFV vs. POP), SEC MIL embedding and SEC MIL attention, we reported sensitivity, specificity, F1 scores and confusion matrices under the Youden threshold to maximize the distinguishability between the two classes. For the SEC MIL gated attention task, we set an arbitrary threshold of 0.9205 to increase sensitivity because the optimal threshold resulted in low sensitivity and high specificity, which are suboptimal for a screen task. Of the three SEC models, the SEC MIL attention model achieves the best sensitivity at 0.74, an improvement from the non-attention base MIL model’s 0.69, while maintaining better specificity at 0.68, outperforming the 0.54 and 0.60 of the non-attention base MIL embedding model and SEC MIL gated attention model, respectively. The ROC plot in Figure 4 also suggests that the SEC MIL attention model outperforms the other two in terms of the distinguishability. We discovered that the attention mechanism leads to better model fitting. Surprisingly, adding the gated mechanism, which aims to reduce redundant information, negatively affected performance, possibly because of increased complexity and overfitting. With the highest AUC and F1 scores, the SEC MIL attention model is our final choice moving forward. It achieves a balance with a slight emphasis on sensitivity, suggesting it could be an effective tool for initial screenings, flagging potential SEC cases for further expert examination.
In addition to the quantitative evaluations, our attention module shows evidence of visualizing the keyframe in the sequence by providing non-uniform attention weightings across keyframes. We provide a representative sample of attention weightings across a set of keyframes in Figure 5. Note that higher attention scores are given to longitudinal views likely because more vessel lumen is present.
Figure 5.

Representative attention visualization on an ultrasound keyframe sample section with and without SEC presence. Note that in the SEC-present sample section, higher attention scores are given to frames with more lumen vessels. This corresponds to longitudinal views. Note the high attention scores highlighted in the red box. SEC, spontaneous echo contrast.
Discussion
In this work, we can identify SEC automatically with an AUC of 0.74. For context, in a past study, agreement among two observers for SEC in the left atrium of the heart was 89% [19]. This reference illustrates the complexity of detecting SEC, especially in small structures, and the inherent challenges in achieving perfect agreement even among experts. Our model’s performance offers valuable insights and aids in a field where absolute certainty is rare, and we encourage future studies to examine inter-reader variability of SEC detection. Our work involves the femoral and popliteal veins, which are much smaller than the atrium of a heart and would result in more challenging SEC detection. There is likely room for model performance improvement, and future work should examine the agreement rate of SEC detection by trained radiologists. At the current performance, our model is designed primarily to augment data collection and serve as a preliminary screening tool, assisting in the organization and analysis of data sets for research. This initial step is critical for laying the groundwork for advanced artificial intelligence development in SEC detection. Although the model holds potential for future clinical screening, its deployment in such applications depends on continual refinement and validation of its performance.
At present, most radiologists do not report the SEC signal. Our model aims to be a pioneer in this area, assisting in data collection during ultrasound screenings and raising awareness of SEC in both academic and clinical settings. To be specific, our utility models can assist clinicians by localizing the vein section in the lower extreme screening, finding the optimal angle and labeling the results of compression tests at high performance. Our SEC model can step in at a point where it preliminarily filters likely examples for further research review. Given the high prevalence of SEC and the lack of immediate urgency, we should prioritize screening with higher sensitivity. The model’s imperfections may include false positives, but given that SEC signals often contain artifacts, it still aids clinicians in manually inspecting and confirming incidences. Our work aims to save clinicians time, allowing them to quickly gather data with initial filtering to concentrate on the most pressing issues, and make more SEC data to be found in the early screening stage to support further research on such a topic.
Ultimately, the goal of our work is to create a model that can detect SEC while running in the background of routine ultrasound examination. One of the limitations of this work is that we use 1−15 preselected ultrasound keyframes, interquartile range = 4, for identifying SEC. Although our model can identify collections of keyframes with and without SEC present, keyframe selection in the data set depends on an experienced sonographer. Ultrasound is a real-time modality with frame rates as high as 120 frames/s. Ideally, a full video cine loop could be available instead of selecting keyframes. Our study acknowledges the lack of cine-loop data as a significant limitation. Currently, our model is trained on static images, which do not capture the time-dependent nature of SEC, an aspect crucial for accurate detection given that SEC manifests as a dynamic, time-variant signal. The static images offer a snapshot but lack the motion context provided by cine loops, which could allow the model to detect the nuanced temporal changes indicative of SEC. The data used in our study reflect what was available and accessible at the time, which did not include cine-loop data. This limitation is a clear area for improvement in future iterations of our research. A current workaround of such a challenge could involve deconstructing the existing cine loops into keyframes suitable for our model’s use. We hope our study serves as an early-stage proof of concept and inspires cine-loop data collection and study in the future. With more understanding of the problem and more high-quality available data that represent a practical clinical use case, we anticipate a significant enhancement in the model’s diagnostic capabilities, aligning it more closely with the real-time decision-making process in clinical ultrasound examinations. Despite these challenges, our research provides a foundational approach to SEC detection using a deep learning model. It is an initial step toward more refined models with more training data that could one day operate alongside standard clinical evaluations, potentially offering real-time SEC assessments.
Future works that intend to transition from keyframe-based SEC detection to cine loop-based SEC detection may benefit from using long short-term memory (LSTM) networks [33]. Keyframes are typically saved because they capture new information, while cine loops are largely similar temporally adjacent frames. LSTMs have a gated attention mechanism which may serve as a better model to account for redundant information present in cine loop frames. LSTMs store an internal hidden state, similar to our mean pooled feature vector, which can be used for SEC prediction.
One of the main challenges of this work is the way SEC presence is labeled at the bag level. This weak label is what led us to look toward MIL-based modeling. Ideally, frame-instance labels could exist; however, this would be costly to generate. Our work, which classifies compression and vascular orientation, is fortunate to have instance-based labels. These tasks have fast convergence and required only two epochs of training to achieve the performance outlined in Table 1. Our task to identify anatomical section (CFV, FV vs. POP) can be thought of as an MIL problem with a high witness rate [25]. Training anatomical section identification also presented strong convergence after two epochs. However, SEC presence has a relatively low witness rate of 1 to 2 keyframes, which leads to more challenging convergence. In training, training for multiple epochs and lower learning rates did not significantly improve the test set metrics.
Incorporating attention improves the model AUC from 0.69 to 0.74. However, gated attention was only able to achieve 0.72, which does not improve beyond compared with our attention-only model. When we investigate the attention scalars illustrated in Figure 5, we see that longitudinal images are given the majority attention weighting (>50%). This may be due to the presence of a darker lumen, which is where backscatter could be present. However, even though there is an extremely prominent SEC finding backscatter in the transverse view of SEC+, the attention score provided is only 4.57%. Thus, attention alone does not perfectly model this problem. Future work investigating instance-based labels may significantly increase AUC performance.
Spontaneous echo contrast is not widely reported outside of academic studies; however, this imaging biomarker has been found to be qualitatively relevant in predicting thromboembolism. Our hope is that this detection model and machine learning data set for SEC increases the practicality of a larger-scale surveillance study to create a more quantitative link between SEC presence and thromboembolism outcomes.
Conclusion
Our multisequence attention CNN model was able to identify SEC from ultrasound keyframes with an AUC of 0.74, a sensitivity of 0.73 and a specificity of 0.68. We investigated the use of soft attention to visualize which keyframes are important to the classification of SEC. This work indicates the feasibility of automatic SEC screening, which does not require specialist interpretation or deviation from routine clinical ultrasound. The retrospective nature of our model means future works could scan archival data to opportunistically correlate SEC findings with documented clinical outcomes. The model and data sets are publicly available at https://github.com/Ouwen/automatic-spontaneous-echo-contrast.
Supplementary Material
Acknowledgments
This study was funded by the Ruth L. Kirschstein National Research Service Award (5F30HL156547-03) and the National Institutes of Health (5T32GM007171-44).
Footnotes
Conflict of interest
The authors declare no competing interests.
Supplementary materials
Supplementary material associated with this article can be found in the online version at doi:10.5281/zenodo.10795591.
Data availability statement
All data and code used for this work are publicly available at https://github.com/Ouwen/automatic-spontaneous-echo-contrast. The data set and code may be subject to follow-up revisions.
References
- [1].Sigel B, Coelho JC, Spigos DG, Flanigan DP, Schuler JJ, Kasprisin DO, et al. Ultrasonography of blood during stasis and coagulation. Invest Radiol 1981;16:71–6. [DOI] [PubMed] [Google Scholar]
- [2].Steinberg EH, Madmon L, Wesolowsky H, Feliciano EA, Sanfilipo MP, Sedlis SP, et al. Prognostic significance of spontaneous echo contrast in the thoracic aorta: relation with accelerated clinical progression of coronary artery disease. J Am Coll Cardiol 1997;30:71–5. [DOI] [PubMed] [Google Scholar]
- [3].Fatkin D, Kelly RP, Feneley MP. Relations between left atrial appendage blood flow velocity, spontaneous echocardiographic contrast and thromboembolic risk in vivo. J Am Coll Cardiol 1994;23:961–9. [DOI] [PubMed] [Google Scholar]
- [4].Daniel WG, Nellessen U, Schröder E, Nonnast-Daniel B, Bednarski P, Nikutta P, et al. Left atrial spontaneous echo contrast in mitral valve disease: an indicator for an increased thromboembolic risk. J Am Coll Cardiol 1988;11:1204–11. [DOI] [PubMed] [Google Scholar]
- [5].Hsu HY, Chung CP, Chen SY, Chiang YY, Hu HH. Spontaneous echo contrast in internal jugular veins: a probable indicator for systemic inflammation and a prothrombotic state. Ultrasound Med Biol 2012;38:926–32. [DOI] [PubMed] [Google Scholar]
- [6].Ito T, Suwa M. Left atrial spontaneous echo contrast: relationship with clinical and echocardiographic parameters. Echo Res Pract 2019;6:R65–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Anderson FA Jr, Spencer FA. Risk factors for venous thromboembolism. Circulation 2003;107:I9–16. [DOI] [PubMed] [Google Scholar]
- [8].Mehta Y, Bhave A. A review of venous thromboembolism risk assessment models for different patient populations: what we know and don’t!. Medicine 2023;102:e32398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Zotz RJ, Müller M, Genth-Zotz S, Darius H. Spontaneous echo contrast caused by platelet and leukocyte aggregates? Stroke 2001;32:1127–33. [DOI] [PubMed] [Google Scholar]
- [10].Merino A, Hauptman P, Badimon L, Badimon JJ, Cohen M, Fuster V, et al. Echocardiographic “smoke” is produced by an interaction of erythrocytes and plasma proteins modulated by shear forces. J Am Coll Cardiol 1992;20:1661–8. [DOI] [PubMed] [Google Scholar]
- [11].Cloutier G, Weng XD, Roederer GO, Allard L, Tardif F, Beaulieu R. Differences in the erythrocyte aggregation level between veins and arteries of normolipidemic and hyperlipidemic individuals. Ultrasound Med Biol 1997;23:1383–93. [DOI] [PubMed] [Google Scholar]
- [12].Nopp S, Moik F, Jilma B, Pabinger I, Ay C. Risk of venous thromboembolism in patients with COVID-19: a systematic review and meta-analysis. Res Pract Thromb Haemost 2020;4:1178–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Katsoularis I, Fonseca-Rodríguez O, Farrington P, Jerndal H, Lundevaller EH, Sund M, et al. Risks of deep vein thrombosis, pulmonary embolism, and bleeding after covid-19: nationwide self-controlled cases series and matched cohort study. BMJ 2022;377:e069590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Connor-Schuler R, Daniels L, Coleman C, Harris D, Herbst N, Fiza B. Presence of spontaneous echo contrast on point-of-care vascular ultrasound and the development of major clotting events in coronavirus disease 2019 patients. Crit Care Explor 2021;3:e0320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Dugar S, Duggal A, Bassel A, Soliman M, Moghekar A. Spontaneous echo contrast in venous ultrasound of severe COVID-19 patients. Intensive Care Med 2020;46:1637–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Kupczyńska K, Kasprzak JD, Michalski B, Lipiec P. Prognostic significance of spontaneous echocardiographic contrast detected by transthoracic and transesophageal echocardiography in the era of harmonic imaging. Arch Med Sci 2013;9:808–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Leung DY, Black IW, Cranney GB, Hopkins AP, Walsh WF. Prognostic implications of left atrial spontaneous echo contrast in nonvalvular atrial fibrillation. J Am Coll Cardiol 1994;24:755–62. [DOI] [PubMed] [Google Scholar]
- [18].Jensen CT, Chahin A, Amin VD, Khalaf AM, Elsayes KM, Wagner-Bartak N, et al. qualitative slow blood flow in lower extremity deep veins on doppler sonography: quantitative assessment and preliminary evaluation of correlation with subsequent deep venous thrombosis development in a tertiary care oncology center. J Ultrasound Med 2017;36:1867–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Kronik G, Stöllberger C, Schuh M, Abzieher F, Slany J, Schneider B. Interobserver variability in the detection of spontaneous echo contrast, left atrial thrombi, and left atrial appendage thrombi by transoesophageal echocardiography. Br Heart J 1995;74:80–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 2021;8:53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Fatkin D, Loupas T, Jacobs N, Feneley MP. Quantification of blood echogenicity: evaluation of a semiquantitative method of grading spontaneous echo contrast. Ultrasound Med Biol 1995;21:1191–8. [DOI] [PubMed] [Google Scholar]
- [22].Klein AL, Murray RD, Black IW, Chandra S, Grimm RA, DSa DA, et al. Integrated backscatter for quantification of left atrial spontaneous echo contrast. J Am Coll Cardiol 1996;28:222–31. [DOI] [PubMed] [Google Scholar]
- [23].Deng J, Dong W, Socher R, Li LJ, Li K, Li FF. ImageNet: a large-scale hierarchical image database. In: Proceedings, 2009 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops). Miami, FL. New York. IEEE; 2009. p. 248–55. [Google Scholar]
- [24].He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV. New York. IEEE; 2016. p. 770–8. [Google Scholar]
- [25].Carbonneau MA, Cheplygina V, Granger E, Gagnon G. Multiple instance learning: a survey of problem characteristics and applications. arXiv 2016:1612.03365. [Google Scholar]
- [26].Ilse M, Tomczak JM, Welling M. Attention-based deep multiple instance learning. arXiv 2018:1802.04712. [Google Scholar]
- [27].Boureau YL, Ponce J, LeCun Y. A theoretical analysis of feature pooling in visual recognition. Proceedings, 27th International Conference on Machine Learning (ICML)-10. Madison, WI, United States: Omnipress; 2010. p. 111–8. [Google Scholar]
- [28].Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv 2014:1412.6980. [Google Scholar]
- [29].Hajian-Tilaki K Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian J Intern Med 2013;4:627–35. [PMC free article] [PubMed] [Google Scholar]
- [30].Wang H, Zhang Y, Yu X. An overview of image caption generation methods. Comput Intell Neurosci 2020;2020:3062706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Datta SK, Shaikh MA, Srihari SN, Gao M. Soft attention improves skin cancer Classification performance: interpretability of machine intelligence in medical image computing, and topological data analysis and its applications for medical data. Cham: Springer; 2021. p. 13–23. [Google Scholar]
- [32].Makkuva AV, Oh S, Kannan S, Viswanath P. Learning in gated neural networks. arXiv 2019:1906.02777. [Google Scholar]
- [33].Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, et al. Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans Pattern Anal Mach Intell 2017;39:677–91. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data and code used for this work are publicly available at https://github.com/Ouwen/automatic-spontaneous-echo-contrast. The data set and code may be subject to follow-up revisions.
