Echocardiogram Vector Embeddings Via R3D Transformer for the Advancement of Automated Echocardiograph
Echocardiograms, represented as black frame sequences in the blue-shaded boxes, are fed into the R3D model, where 3D convolutions pick up spatiotemporal patterns and eventually reduce the video into a 400-dimensional fully connected layer before the sigmoid head. We extract the last fully-connected layer as the vector embeddings of an echocardiogram, and the principle component analysis (PCA) of these vector embeddings shows that EF patterns are preserved among them. Black arrows = show equivalence; black lines = highlight dimensions; blue lines = represent a 3D convolution; purple shapes = the vector embeddings; blue and green shapes = convolutional and pooling layers; orange shapes = fully connected and dropout layers; purple shape = sigmoid layer.
Our workflow is 2-fold: we train the R3D transformer to discriminate between high and low EF and use the trained R3D to generate vector embeddings for each echocardiogram in the EchoNet dataset.