You heard of DeepMinds AlphaGo that beat worlds best Go player in the game everyone said computers would still need ten years to beat humans in.
That version trained on millions of expert human gameplays and then trained on itself through reinforcement learning.
This version skips all human gameplay and learns by playing against itself through a novel reinforcement learning method. It only has the rules of the game and starts to play against itself, making adjustments and keeping the versions that improve.
Blog Post: https://deepmind.com/blog/alphago-zero-learning-scratch/
Research Page: https://deepmind.com/research/alphago/
If you would like to replicate the research, there is an open source project that is based on the paper https://github.com/gcp/leela-zero. However, in order to get the same results as AplhaGo Zero, you would need to have the same weights, and in order to achieve similar weights, you would need to have access to the same computing power as they. It would take 1700 years on commodity computers. The projects aim is to make a distributed effort to repeat the work.