Figure - PMC

Skip to main content

View full-text article in PMC

. Author manuscript; available in PMC: 2019 Dec 23.

Published in final edited form as: Adv Neural Inf Process Syst. 2019 Dec;32:9392–9402.

Figure 2: — A developer writes SFs (λ_i=1,…,k) over input data and specifies any. (a) backbone architecture (e.g. ResNet [15], BERT [12]) as a feature extractor. Extracted features are shared parameters for k slice-residual attention modules; each learns a (b) slice indicator head, which is supervised by a corresponding λ_i, and a (c) slice expert representation, which is trained only on examples belonging to the slice using a (d) shared slice prediction head. An attention mechanism reweights these representations into a combined, (e) slice-aware representation. A final (f) prediction head makes model predictions based on the slice-aware representation.