Ногa: #всякаяфигня We can view humans as being RL agents, and in particular consisting of the human rewar

#всякаяфигня

We can view humans as being RL agents, and in particular consisting of the human reward circuitry (RL algorithm) that optimizes the rest of the brain (RL policy/mesaoptimizer) for reward.

This resolves the question of why humans don’t want to wirehead—because you identify with the rest of the brain and so “your” desires are the mesaobjectives. “You” (your neocortex) know that sticking electrodes in your brain will cause reward to be maximized, but your reward circuitry is comparatively pretty dumb and so doesn’t realize this is an option until it actually gets the electrodes (at which point it does indeed rewire the rest of your brain to want to keep the electrodes in). You typically don’t give into your reward circuitry because your RL algorithm is pretty dumb and your policy is more powerful and able to outsmart the RL algorithm by putting rewards out of reach. However, this doesn’t mean your policy always wins against the RL algorithm! Addiction is an example of what happens when your policy fails to model the consequences of doing something, and then your reward circuitry kicks into gear and modifies the objective of the rest of your brain to like doing the addictive thing more.

In particular, one consequence of this is we also don’t [necessary] need to postulate the existence of some kind of special as yet unknown algorithm that only exists in humans to be able to explain why humans end up caring about things in the world. Whether humans wirehead is determined by the same thing that determines whether RL agents wirehead.

https://www.lesswrong.com/posts/jP9cKxqwqk2qQ6HiM/towards-deconfusing-wireheading-and-reward-maximization

13 ноября 2022

Отключить рекламу

Дверь в неизвестность. Параллельный мир гет	+23
Вересковый мёд гет	+14
Закон есть закон, или Узница Азкабана гет	+12
Лошадка джен	+11
Как я или со мной? гет	+8
Игры не для нас джен	+8
Когда гроза стала нашим домом гет	+8
Порочный гет	+5
Мальчик-который-в-деле джен	+5
Гарри Поттер и Таинственное наследие джен	+4