Feedback from humans to help machines learn

Deep Mind and Open AI collaborated on an interesting project where they discovered how to use human feedback to help a deep learning algorithm learn by providing the reward feedback. The goal is to help improve AI safety by correcting wrong behavior through human intervention.

An example is a to help a robot perform a backflip.
It learns through reinforcement learning, and sometimes it asks a human which alternative is the best one, and the humans choice is used to train a reward predictor, which it uses in the reinforcement learning process.

The idea was that the algorithm tried different methods and presented alternatives to the human, where the human could choose which one was the best one to reach its goal of performing a backflip. It would continue and generate its own reward estimates, continue learning and later check in with the human to see how it had improved and which alternative now was the best one. To train a robot to perform a backflip, 900 such inputs were needed.

This method is helpful for situations where it is difficult to create a reward function.

This iterative approach to learning means that a human can spot and correct any undesired behaviours, a crucial part of any safety system. The design also does not put an onerous burden on the human operator, who only has to review around 0.1% of the agent’s behaviour to get it to to do what they want. However, this can mean reviewing several hundred to several thousand pairs of clips, something that will need to be reduced to make it applicable to real world problems.

Read about it here:
The Paper
A blog Post from Deep Mind
OpenAI’s blog post

Practical Deep Learning for coders (course.fast.ai)

This is in my opinion the best free course on getting into the state of the art in deep learning. It is a site that offers a free 7 week learning experience for deep learning. taught by 2 year in a row Kaggle winner, entrepreneur and generally nice guy Jeremy Howard and Math PhD/Data scientist/Full stack developer/Forbes Featured Rachel Thomas, two amazing people in AI. Their approach to teaching Deep Learning for Coders is that it shall be accessible to as many people as possible and not to a selected few. So instead of abstract mathy lectures, they allow you to get your hands dirty from the first lecture and improve your intuition of the field, thus enabling you to create state of the art deep learning solutions from day one.

After starting the course, i immediately realized that these are very talented educators that are sincere about their goal to make AI accessible to everyone, and to make it benefit others. What i especially like about the course is the way they approach the topic pedagogically. Their method is inspired by the book “Making Learning Whole: How Seven Principles of Teaching can Transform Education” by author David Perkins. Perkins compares todays education with learning baseball:

If you would learn baseball the way that math is taught, you would first learn about the shape of a parabola, and then you would learn the material science behind the stitchings in baseballs and so forth. And twenty years later after you have completed your PhD and post-doc, you would be taken to your first baseball game and you would be introduced to the rules of baseball. And then 10 years later you might get to hit. The way that in practice baseball is taught is we take the kid down to the baseball diamond and we say “These people are playing baseball, would you like to play?” And they say, “Yeah! Sure I would”. “Perfect, stand here, i’m gonna throw this. Hit it. Ok, great, now run. Good you’re playing baseball”

That is why the first class of the course they demonstrate that here are 7 lines of code that you can use to perform state of the art image classification using deep learning. And to do any image classification you want as long as you structure it the right way. You may not understand most of it, but as you need to adapt the tasks to your needs, you will need to learn more details, and thus you learn.

The course consists of a 2 hour lecture each week, detailed lecture notes, a community contributed wiki and jupyter notebooks which you also will do your assignments in. (There is also setup instructions for getting a GPU equipped machine up and running on AWS) In The first weeks assigment you will submit an entry into the Kaggle competition for classifying cat and dog images. By taking advantage of what you learn you will outperform what was the state of the art when the competition was launched 2013.

http://course.fast.ai/

Hybrid Reward Architecture breaks world record AI and Human for Pac Man

Maluuba, a microsoft bought up AI startup achieved the highest possible score (999 990) for the very difficult to beat Ms Pac-Man. It used a divide and conquer like reinforcement learning where responsibilities for each positive and negative reward giving elements in the game are assigned an individual agent that seeks to suggest to the player a move that is best for reaching that particular local goal. A “manager” agent recieves these suggestinos and decides what move the user shall perform in order to achieve the maximum reward.

http://www.maluuba.com/blog/2017/6/14/hra

 

Read the paper here