We know that computers have beat humans in chess, that was a great breakthrough and a milestone in AI.
The worlds strongest AI for chess, called Stockfish was recently dethroned by a deep reinforcement AI by Googles DeepMind called AlphaZero.
Here is a walkthrough of the third game further explanation on chess.com “How Does AlphaZero Play Chess?”
Titta på AlphaZero vs Stockfish Chess Match: Game 3 från Chess på www.twitch.tv
You heard of DeepMinds AlphaGo that beat worlds best Go player in the game everyone said computers would still need ten years to beat humans in.
That version trained on millions of expert human gameplays and then trained on itself through reinforcement learning.
This version skips all human gameplay and learns by playing against itself through a novel reinforcement learning method. It only has the rules of the game and starts to play against itself, making adjustments and keeping the versions that improve.
Blog Post: https://deepmind.com/blog/alphago-zero-learning-scratch/
Research Page: https://deepmind.com/research/alphago/
If you would like to replicate the research, there is an open source project that is based on the paper https://github.com/gcp/leela-zero. However, in order to get the same results as AplhaGo Zero, you would need to have the same weights, and in order to achieve similar weights, you would need to have access to the same computing power as they. It would take 1700 years on commodity computers. The projects aim is to make a distributed effort to repeat the work.
Unity has released a new SDK supporting machine learning agents in the Unity gaming engine. This enables you to:
- Study complex multi agent behaviors in realistic competitive and cooperative scenarios. This is a lot safer than doing it with robots.
- Study multi-agent behavior for industrial robotics, autonomous vehicles and other applications in a realistic environment.
- Create intelligent agents for your games.
The benefits of this is that it will be a lot easier to develop and test learning algorithms that can later be used in real life. There is also a potential danger. In the same way that we can test industry robots, autonomous vehicles etc that can then be ported into the real world. We will inevitably see very smart ai agents that drive opponents in realistic war games. These can also be ported into the real world.
Since deep reinforcement learning can beat any human player in any game, the more realistic the game gets, the scarier it gets to imagine what would happen if you plug such an ai into some fighter jets or autonomous tanks.
Paper: A Brief Survey of Deep Reinforcement Learning
Authors: Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, Anil Anthony Bharath
Submitted: Submitted on 19 Aug 2017
Read the PDF
Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep Q-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.
OpenAi developed an AI that wins agains the best professional dota 2 players in the world in 1-on-1 games. It does not use imitation-learning or tree search to learn. Instead it learns by playing agains a copy of itself continuously improving. The game is very complicated and if you would code the ai by hand you would maybe create a quite poor player. By having the computer to teach itself to play it learns a lot of tactics.
read more at:
Here are tactics it learned by itself:
AIs learning to play atari games are very impressive, beating Go champions was an eye opener to the world. Now DeepMind together with Blizzard releases Starcraft II as an ai research environment
It will be very interesting to see what happens and to try it out.
I have attempted at creating AI scripts for Age of Empires II (which is the best game ever btw) and there are quite good scripts for it. It is however limited by the API that the scripting engine in AOE2 has, and there the scripts are just looped over and over again and if a condition is met, that particular rule is executed.
In this case, you will get a half a million anonymized game replays, a machine learning API, a connection between DeepMinds toolset and Blizzards API.
It will be very interesting to see how deep learning can take on this.
I can imagine we will se pro-like reactions to be used agains user tactics. When you are scripting an ai for instance for AOE2, you need to take a whole bunch of tactics into account. And once you know how an ai script behaves you can easily beat it. Even thought the “new” ai script made for the newest releases for AOE2HD are considered very difficuly, you can beat it by tower rushing it, making it impossible for the computer to gain an economy advantage since the towers keep them form gathering resources. The benefit of the AI is often that it can multitask.
I can imagine that deep with reinforcement learning the computer will generate tactics to counter pro gamers. I quess however, that it will take a year or two before we see deep learning beat pro-gamers.
I hope to see some very interesting games…
On the other hand. I am not sure that i think it is that very good to put the efforts of AI research into developing war strategy machine learning.
Here is the paper.
Deep Mind and Open AI collaborated on an interesting project where they discovered how to use human feedback to help a deep learning algorithm learn by providing the reward feedback. The goal is to help improve AI safety by correcting wrong behavior through human intervention.
An example is a to help a robot perform a backflip.
It learns through reinforcement learning, and sometimes it asks a human which alternative is the best one, and the humans choice is used to train a reward predictor, which it uses in the reinforcement learning process.
The idea was that the algorithm tried different methods and presented alternatives to the human, where the human could choose which one was the best one to reach its goal of performing a backflip. It would continue and generate its own reward estimates, continue learning and later check in with the human to see how it had improved and which alternative now was the best one. To train a robot to perform a backflip, 900 such inputs were needed.
This method is helpful for situations where it is difficult to create a reward function.
This iterative approach to learning means that a human can spot and correct any undesired behaviours, a crucial part of any safety system. The design also does not put an onerous burden on the human operator, who only has to review around 0.1% of the agent’s behaviour to get it to to do what they want. However, this can mean reviewing several hundred to several thousand pairs of clips, something that will need to be reduced to make it applicable to real world problems.
Read about it here:
A blog Post from Deep Mind
OpenAI’s blog post
Maluuba, a microsoft bought up AI startup achieved the highest possible score (999 990) for the very difficult to beat Ms Pac-Man. It used a divide and conquer like reinforcement learning where responsibilities for each positive and negative reward giving elements in the game are assigned an individual agent that seeks to suggest to the player a move that is best for reaching that particular local goal. A “manager” agent recieves these suggestinos and decides what move the user shall perform in order to achieve the maximum reward.
Read the paper here