Rewards in Reinforcement Learning Make Machines Behave Like Humans

Rewards in Reinforcement Learning Make Machines Behave Like Humans

AI abilities do not emerge from complex problem-solving methods but from reinforcement learning

Randomness is least welcome in our lives, at least during the busy part of the day, like when want to catch up with the updates of an IPL match. For sure your browser gives you the most recent updates from IPL match and this is how news recommendations work, even though you haven't reacted to IPL news with likes or tweets in the past few days. How is it possible? Reinforcement learning is the name of the game. AI Algorithms are known for taking data inputs and finding a pattern to generate a result that is in line with results generated under similar circumstances. This is possible when the circumstances are not so random. But in situations like playing a game that is completely a random event, given the quirks and fancies of the human mind, how reinforcement learning will help train a machine to react?

Reinforcement learning is basically, letting the machine learn itself from the past results rather than identifying a pattern from the data fed. This is what differentiates artificial narrow intelligence from artificial general intelligence, which works towards making machines think for themselves. It works on the principle, intuition grows with iterative learning, making mistakes, checking the result, adjusting the process and repeating. This works mostly with complex reinforcement learning and deep reinforcement learning algorithms and rewards play a key role in making machine improve their performance. A recent paper, 'Reward is enough', submitted to a peer-reviewed Artificial Intelligence journal, by the authors of 'attention is all you need', postulates that General Artificial Intelligence abilities do not emerge from complex problem-solving methods but by having reward maximization method.

Does reward maximisation work?

Through this paper, the authors are attempting to define reward as the only way to design the system, for a machine to thrive in an environment. The paper's propositions around what constitutes intelligence, environment, and learning are rather unclear. The paper explains the evolution of intelligence through maximization of rewards while defining maximizing rewards as the only way to gain intelligence. This is synonymous with a cat learning to take cue when fed with snacks while the cat thinks binging on snacks is equal to learning cues.

According to them, systems do not require any prior knowledge about the environment as the agent is capable of considering rewards as a way of learning. It lays more stress on rewards than on defining rewards or designing the environment. In a situation where the system has an overperforming reward system in a poorly defined environment, the results might turn out to be counterproductive. And also, there is no method to quantify rewards. How would one quantify feelings like happiness, gratification, and sense of achievement which are very much considered rewards by human psych?

With reward maximizing technique, the researchers can definitely achieve general intelligence, if they consider it a necessary but not sufficient condition. Until then, it is in the best interests of the tech community to treat it just as a conjecture.

Related Stories

No stories found.
logo
Analytics Insight
www.analyticsinsight.net