CheXDet architecture. An EfficientNet backbone is used for feature
extraction, which also downsamples the data in width and height. The
multiscale features (ie, p2, p3, p4, p5, and p6) are then fed into three
bidirectional feature pyramid network (BiFPN) layers for information
aggregation and enrichment. The bidirectional feature pyramid network
introduces top-down feature aggregation (red arrows), bottom-up feature
aggregation (green arrows), and feature aggregation from the same scales
(blue arrows). Next, a region proposal network (RPN) module and a region
of interest (ROI) alignment module are used to generate bounding-box
proposals based on the bidirectional feature pyramid network features.
The proposal features are further fed into four convolutional (conv)
layers. Finally, two fully connected layers conduct classification and
regression based on the proposals, respectively, and generate the
predictions.