Reward shaping |
Model-free |
Contextual instructions |
Clouse and Utgoff, 1992
|
|
|
Evaluative feedback |
Isbell et al., 2001; Thomaz et al., 2006; Tenorio-Gonzalez et al., 2010; Mathewson and Pilarski, 2016
|
|
Model-based |
Contextual instructions |
Najar et al., 2015b
|
|
|
Evaluative feedback |
Knox and Stone, 2010, 2011a, 2012b
|
Value shaping |
Model-free |
General instructions |
Utgoff and Clouse, 1991; Maclin and Shavlik, 1996; Kuhlmann et al., 2004; Maclin et al., 2005a,b; Torrey et al., 2008
|
|
|
Evaluative feedback |
Dorigo and Colombetti, 1994; Colombetti et al., 1996; Najar et al., 2016
|
|
Model-based |
Contextual instructions |
Najar et al., 2015a, 2016
|
|
|
Evaluative feedback |
Knox and Stone, 2010, 2011a, 2012b
|
Policy shaping |
Model-free |
Contextual instructions |
Rosenstein et al., 2004
|
|
|
Evaluative feedback |
Ho et al., 2015; MacGlashan et al., 2017; Najar et al., 2020b
|
|
Model-based |
Contextual instructions |
Pradyot et al., 2012b; Grizou et al., 2013; Najar et al., 2020b
|
|
|
Evaluative feedback |
Knox and Stone, 2010, 2011a, 2012b; Lopes et al., 2011; Griffith et al., 2013; Loftin et al., 2016
|
|
|
Corrective feedback |
Lopes et al., 2011
|
Decision biasing |
|
Guidance |
Thomaz and Breazeal, 2006; Suay and Chernova, 2011
|
|
|
Contextual instructions |
Nicolescu and Mataric, 2003; Rosenstein et al., 2004; Rybski et al., 2007; Thomaz and Breazeal, 2007b; Tenorio-Gonzalez et al., 2010; Cruz et al., 2015
|