The Ethics of Reward Shaping
Choosing an effective loss function is a critical part of training ML models. This thought provoking article reminds us to be critical in the choice of this function, especially as in many models the reward function itself is unclear – does a recommendation system (e.g. promoting new articles, or songs) simply create an echo chamber, or does it broadly converge on the mean? Which of these should score higher? If we penalise the system when users don't click on articles which violate their confirmation bias – are we acting ethically?
Musings on systems, information, learning, and optimization.