Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2020 Dec;132:428–446. doi: 10.1016/j.neunet.2020.08.022

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2020 The Authors

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

PMC Copyright notice

Fig. 1 — Learning from a noisy linear teacher. (A) A dataset $D = {x^{μ}, y^{μ}}, μ = 1, \dots, P$ of $P$ examples is created by providing random inputs $x$ to a teacher network with a weight vector $\bar{w}$ , and corrupting the teacher outputs with noise of variance $σ_{ϵ}^{2}$ . (B) A student network is then trained on this dataset $D$ . (C) Example dynamics of the student network during full batch gradient descent training. Training error (blue) decreases monotonically. Test error, also referred to as generalization error, (yellow), here computable exactly (4), decreases to a minimum $E_{g}^{*}$ at the optimal early stopping time $t^{*}$ before increasing at longer times ( $E_{g}^{late}$ ), a phenomenon known as overtraining. Because of noise in the teacher’s output, the best possible student network attains finite generalization error (“oracle”, green) even with infinite training data. This error is the approximation error $E_{\infty}$ . The difference between test error and this best-possible error is the estimation error $E_{est}$ . (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)