Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2024 Oct 8;6(4):19–42. doi: 10.46989/001c.124131

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

This is an open access article distributed under the terms of the Creative Commons Attribution License (4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PMC Copyright notice

Figure 3. — (A) At a high level, deep learning models consist of an encoder, which transforms or condenses the input data into a more informative intermediate representation (latent space), and a decoder or prediction head, which generates the desired output from this latent representation. (B) Recurrent neural networks (RNNs) process data sequentially, updating a hidden state (h) at each step by incorporating information from the current input and the previous hidden state. For example, when processing the sentence “she is examining a blood smear,” the first hidden state (h₁) is generated based on the word “she.” The second hidden state (h₂) is then computed using the second word “is” and h₁, allowing it to capture information from both the current and previous words. This process continues for each word in the sequence, with the final hidden state (h_n) incorporating information from all the preceding words. (C) Bidirectional Encoder Representations from Transformers (BERT) utilizes the encoder component of the Transformer model, which consists of self-attention layers and multi-layer perceptrons (MLPs). During the training process, the objective is to predict randomly masked words in a sentence. Although the words are masked, the self-attention mechanism allows BERT to capture the contextual relationships between words, enabling it to infer the semantic meaning based on the surrounding context. (D) Residual connections in convolutional neural networks (CNN) enable the direct flow of information by skipping one or more layers, facilitating the creation of deeper networks. (E) The Vision Transformer (ViT) is a novel approach to image recognition tasks that adapts the Transformer architecture. In ViT, an input image is divided into small patches where self-attention is performed. A special classification token, denoted as “CLS” in the figure, is appended to the patch embeddings and participates in the self-attention process, allowing it to gather information from all patches. The output representation corresponding to the “CLS” token is then used for image classification or other downstream tasks. (F) The U-Net is a specialized CNN architecture that has gained popularity in medical image segmentation tasks due to its ability to perform pixel-level classification. The U-Net consists of an encoder path, which uses convolutional layers to encode it into a compact latent representation, followed by a symmetric decoder path that employs transposed convolutions to gradually restore the latent feature back to the original image resolution. Residual connections between corresponding encoder and decoder layers allow for the direct transfer of localized spatial information. In the example, the U-Net segments the blast cell from the background by assigning the value 1 to pixels within the blast and 0 to the background pixels.