Skip to main content
. 2020 Jun 9;117(48):30039–30045. doi: 10.1073/pnas.1907369117

Fig. 1.

Fig. 1.

Top graphs are associated to functions. Each Bottom diagram (Insets) depicts the ideal network approximating the function above. (Inset A) A shallow universal network in 8 variables and N units approximates a generic function of 8 variables f(x1,,x8). (Inset B) A hierarchical network at the bottom in n=8 variables, which approximates well functions of the form f(x1,,x8)=h3(h21(h11(x1,x2),h12(x3,x4)),h22(h13(x5,x6),h14(x7,x8))) as represented by the binary graph above. In the approximating network each of the n1 nodes in the graph of the function corresponds to a set of ReLU units. Similar to the shallow network, a hierarchical network is universal; that is, it can approximate any continuous function; the text discusses how it can approximate a subclass of compositional functions exponentially better than a shallow network. Redrawn from ref. 23.