Warning: Attempt to read property "published_at" on null in /home/bskogcom/public_html/wp-content/plugins/wordpress-seo/src/builders/indexable-term-builder.php on line 131

Warning: Attempt to read property "last_modified" on null in /home/bskogcom/public_html/wp-content/plugins/wordpress-seo/src/builders/indexable-term-builder.php on line 132
OpenAI Archives - Bskog AI

AI wins agains the best professional dota players

OpenAi developed an AI that wins agains the best professional dota 2 players in the world in 1-on-1 games. It does not use imitation-learning or tree search to learn. Instead it learns by playing agains a copy of itself continuously improving. The game is very complicated and if you would code the ai by hand you would maybe create a quite poor player. By having the computer to teach itself to play it learns a lot of tactics.

read more at:
https://blog.openai.com/dota-2/

Here are tactics it learned by itself:

Feedback from humans to help machines learn

Deep Mind and Open AI collaborated on an interesting project where they discovered how to use human feedback to help a deep learning algorithm learn by providing the reward feedback. The goal is to help improve AI safety by correcting wrong behavior through human intervention.

An example is a to help a robot perform a backflip.
It learns through reinforcement learning, and sometimes it asks a human which alternative is the best one, and the humans choice is used to train a reward predictor, which it uses in the reinforcement learning process.

The idea was that the algorithm tried different methods and presented alternatives to the human, where the human could choose which one was the best one to reach its goal of performing a backflip. It would continue and generate its own reward estimates, continue learning and later check in with the human to see how it had improved and which alternative now was the best one. To train a robot to perform a backflip, 900 such inputs were needed.

This method is helpful for situations where it is difficult to create a reward function.

This iterative approach to learning means that a human can spot and correct any undesired behaviours, a crucial part of any safety system. The design also does not put an onerous burden on the human operator, who only has to review around 0.1% of the agent’s behaviour to get it to to do what they want. However, this can mean reviewing several hundred to several thousand pairs of clips, something that will need to be reduced to make it applicable to real world problems.

Read about it here:
The Paper
A blog Post from Deep Mind
OpenAI’s blog post