Table 6. Overview of image retrieval approaches for fibred endoscopic imaging.
Topic | References | Methodology | Comments |
---|---|---|---|
Image retrieval through low-level visual features | André et al. (2009a) | Bag of Visual Words (k-means clustering) of multi-scale SIFT descriptors extracted from regularly distributed circular regions. | Thorough methodologies for image and video retrieval based solely on low-level information extracted from images. Due to lack of relevant ground truth, methodologies were evaluated as binary classification tasks (instead of retrieval). |
André et al. (2009b) | Introduce (i) spatial information between local features by exploiting the co-occurrence matrix of their visual words (ii) temporal relationship across frames through mosaicing. | ||
André et al. (2010) | Deriving visual words from individual frames and weighting the contributions of local regions through the relevant overlap rate derived during mosaicing. | ||
André et al. (2012b,2011b) | Combining and clinically testing above approaches as a binary classification (kNN) between neoplastic/benign colonic epithelium. | ||
André et al. (2011a) | (i) Generate the “perceived similarity” ground truth (manual assessment - Likert scale), and (ii) learn an adjusted similarity/distance metric (linear transform) for optimal mapping of video signatures (histograms of visual words). | First attempt to evaluate directly the performance of endomicroscopic video retrieval, through generating the perceived similarity of ground truth. | |
Image retrieval combining low-level visual features with high-level semantic contex | André et al. (2012a,c) | Fisher-based approach transforming visual word histograms to 8 binary semantic concepts. Combine with adjusted similarity distance to improve “perceived similarity”. | Bridging the semantic gap between low-level visual features, extracted from the images, and high-level clinical knowledge, generated through human perception. |
Watcharapichat (2012) | Gabor filter and Earth Mover’s Distance based retrieval enhanced through iterative “relevance feedback” and Isomap dimensionality reduction. | ||
Tous et al. (2012) | Retrieval via (i) low-level, image-based features (LBPs & k-NN with Euclidian or Manhattan distances), (ii) high level key-word semantic descriptions (Apache Lucene search engine), and (iii) third party software compatibility through MPEG Query Format & JPEG Search standards. | ||
Other image retrieval approaches | Kohandani Tafresh et al. (2014) | Semi-automated query adaptation of André et al. (2011b) via (i) temporal segmentation based on kinetic stability (Euclidean distance of SHIFT descriptors across consecutive frames), and (ii) manual selection of spatially stable segments. | Adaptations of André et al. (2011b) enhancing retrieval performance. |
Gu et al. (2017) | Unsupervised, multimodal graph mining (i) deriving similar (cycle consistency) and dissimilar (geodesic distance) FBEμ and histology frame pairs, (ii) learning discriminative features in the associated latent space. |