Reinforcement learning

September 3, 2018September 4, 2018

Educational material for learning Deep Reinforcement Learning in 60 days

Github user andri27-ts has put together materail for learning Deep Reinforcement Learning in 60 days. If you find DeepMinds breakthroughs with thyr AlphaGo Zero and OpenAI’s Dota 2 facinating and want to learn how they work, the repository offers resources and project suggestions.

https://github.com/andri27-ts/60_Days_RL_Challenge

December 23, 2017December 23, 2017

DeepMinds AlphaZero beat the strongest open-source chess engine

We know that computers have beat humans in chess, that was a great breakthrough and a milestone in AI.
The worlds strongest AI for chess, called Stockfish was recently dethroned by a deep reinforcement AI by Googles DeepMind called AlphaZero.

Here is a walkthrough of the third game further explanation on chess.com “How Does AlphaZero Play Chess?”

Titta på AlphaZero vs Stockfish Chess Match: Game 3 från Chess på www.twitch.tv

November 30, 2017November 30, 2017

Get a hang of reinforcement learning for games

Check this out: Introduction to AI for Video Games (Reinforcement Learning) by Siraj Raval

And this: Lecture 10: Reinforcement Learning in CS188 Artificial Intelligence, Fall 2013 (University of California, Berkley)

Also this lecture on Deep Reinforcement Learning from Stanford CS231n

And this playlist: Introduction to Reinforcement Learning by Deep Mind

If you are interested in creating Games in Unity, this is also something you shall check out https://github.com/Unity-Technologies/ml-agents

And this: Playlist (Unity Machine Learning)

This Lecture from MIT was also interesting (and the DeepTraffic assignment was fun)

November 17, 2017November 25, 2017

AlphaGo Zero, beats previous world champion winner AlphaGo, 100-0

You heard of DeepMinds AlphaGo that beat worlds best Go player in the game everyone said computers would still need ten years to beat humans in.

That version trained on millions of expert human gameplays and then trained on itself through reinforcement learning.

This version skips all human gameplay and learns by playing against itself through a novel reinforcement learning method. It only has the rules of the game and starts to play against itself, making adjustments and keeping the versions that improve.

Blog Post: https://deepmind.com/blog/alphago-zero-learning-scratch/
Research Page: https://deepmind.com/research/alphago/

If you would like to replicate the research, there is an open source project that is based on the paper https://github.com/gcp/leela-zero. However, in order to get the same results as AplhaGo Zero, you would need to have the same weights, and in order to achieve similar weights, you would need to have access to the same computing power as they. It would take 1700 years on commodity computers. The projects aim is to make a distributed effort to repeat the work.

September 20, 2017November 17, 2017

Unity Machine Learning Agents. Super awesome, possibly terrifying

Unity has released a new SDK supporting machine learning agents in the Unity gaming engine. This enables you to:

Study complex multi agent behaviors in realistic competitive and cooperative scenarios. This is a lot safer than doing it with robots.
Study multi-agent behavior for industrial robotics, autonomous vehicles and other applications in a realistic environment.
Create intelligent agents for your games.

https://blogs.unity3d.com/2017/09/19/introducing-unity-machine-learning-agents/

The benefits of this is that it will be a lot easier to develop and test learning algorithms that can later be used in real life. There is also a potential danger. In the same way that we can test industry robots, autonomous vehicles etc that can then be ported into the real world. We will inevitably see very smart ai agents that drive opponents in realistic war games. These can also be ported into the real world.

Since deep reinforcement learning can beat any human player in any game, the more realistic the game gets, the scarier it gets to imagine what would happen if you plug such an ai into some fighter jets or autonomous tanks.

August 23, 2017

Paper on Deep Reinforcement Learning

Paper: A Brief Survey of Deep Reinforcement Learning
Authors: Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, Anil Anthony Bharath
Submitted: Submitted on 19 Aug 2017
Read the PDF

Abstract:

Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep Q-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.

August 14, 2017

AI wins agains the best professional dota players

OpenAi developed an AI that wins agains the best professional dota 2 players in the world in 1-on-1 games. It does not use imitation-learning or tree search to learn. Instead it learns by playing agains a copy of itself continuously improving. The game is very complicated and if you would code the ai by hand you would maybe create a quite poor player. By having the computer to teach itself to play it learns a lot of tactics.

read more at:
https://blog.openai.com/dota-2/

Here are tactics it learned by itself:

August 10, 2017

DeepMind and Blizzard releases Starcraft II as an AI research environment

AIs learning to play atari games are very impressive, beating Go champions was an eye opener to the world. Now DeepMind together with Blizzard releases Starcraft II as an ai research environment
It will be very interesting to see what happens and to try it out.

I have attempted at creating AI scripts for Age of Empires II (which is the best game ever btw) and there are quite good scripts for it. It is however limited by the API that the scripting engine in AOE2 has, and there the scripts are just looped over and over again and if a condition is met, that particular rule is executed.

In this case, you will get a half a million anonymized game replays, a machine learning API, a connection between DeepMinds toolset and Blizzards API.

It will be very interesting to see how deep learning can take on this.
I can imagine we will se pro-like reactions to be used agains user tactics. When you are scripting an ai for instance for AOE2, you need to take a whole bunch of tactics into account. And once you know how an ai script behaves you can easily beat it. Even thought the “new” ai script made for the newest releases for AOE2HD are considered very difficuly, you can beat it by tower rushing it, making it impossible for the computer to gain an economy advantage since the towers keep them form gathering resources. The benefit of the AI is often that it can multitask.

I can imagine that deep with reinforcement learning the computer will generate tactics to counter pro gamers. I quess however, that it will take a year or two before we see deep learning beat pro-gamers.

I hope to see some very interesting games…

On the other hand. I am not sure that i think it is that very good to put the efforts of AI research into developing war strategy machine learning.

Here is the paper.

August 1, 2017

Feedback from humans to help machines learn

Deep Mind and Open AI collaborated on an interesting project where they discovered how to use human feedback to help a deep learning algorithm learn by providing the reward feedback. The goal is to help improve AI safety by correcting wrong behavior through human intervention.

An example is a to help a robot perform a backflip.
It learns through reinforcement learning, and sometimes it asks a human which alternative is the best one, and the humans choice is used to train a reward predictor, which it uses in the reinforcement learning process.

The idea was that the algorithm tried different methods and presented alternatives to the human, where the human could choose which one was the best one to reach its goal of performing a backflip. It would continue and generate its own reward estimates, continue learning and later check in with the human to see how it had improved and which alternative now was the best one. To train a robot to perform a backflip, 900 such inputs were needed.

This method is helpful for situations where it is difficult to create a reward function.

This iterative approach to learning means that a human can spot and correct any undesired behaviours, a crucial part of any safety system. The design also does not put an onerous burden on the human operator, who only has to review around 0.1% of the agent’s behaviour to get it to to do what they want. However, this can mean reviewing several hundred to several thousand pairs of clips, something that will need to be reduced to make it applicable to real world problems.

Read about it here:
The Paper
A blog Post from Deep Mind
OpenAI’s blog post

June 15, 2017

Hybrid Reward Architecture breaks world record AI and Human for Pac Man

Maluuba, a microsoft bought up AI startup achieved the highest possible score (999 990) for the very difficult to beat Ms Pac-Man. It used a divide and conquer like reinforcement learning where responsibilities for each positive and negative reward giving elements in the game are assigned an individual agent that seeks to suggest to the player a move that is best for reaching that particular local goal. A “manager” agent recieves these suggestinos and decides what move the user shall perform in order to achieve the maximum reward.

http://www.maluuba.com/blog/2017/6/14/hra

Read the paper here