Коллекции загружаются
#всякаяфигня
We can view humans as being RL agents, and in particular consisting of the human reward circuitry (RL algorithm) that optimizes the rest of the brain (RL policy/mesaoptimizer) for reward. This resolves the question of why humans don’t want to wirehead—because you identify with the rest of the brain and so “your” desires are the mesaobjectives. “You” (your neocortex) know that sticking electrodes in your brain will cause reward to be maximized, but your reward circuitry is comparatively pretty dumb and so doesn’t realize this is an option until it actually gets the electrodes (at which point it does indeed rewire the rest of your brain to want to keep the electrodes in). You typically don’t give into your reward circuitry because your RL algorithm is pretty dumb and your policy is more powerful and able to outsmart the RL algorithm by putting rewards out of reach. However, this doesn’t mean your policy always wins against the RL algorithm! Addiction is an example of what happens when your policy fails to model the consequences of doing something, and then your reward circuitry kicks into gear and modifies the objective of the rest of your brain to like doing the addictive thing more. In particular, one consequence of this is we also don’t [necessary] need to postulate the existence of some kind of special as yet unknown algorithm that only exists in humans to be able to explain why humans end up caring about things in the world. Whether humans wirehead is determined by the same thing that determines whether RL agents wirehead. https://www.lesswrong.com/posts/jP9cKxqwqk2qQ6HiM/towards-deconfusing-wireheading-and-reward-maximization 13 ноября 2022
1 |