The transformer … “explained”?
Attention is all you need
The transformer architecture is at the core of a lot of recent hot development in Deep Learning, especially NLP: GBT-2, BERT, and apparently AlphaStar aswell. A good first non-technical explanation to understand what lies behind the architecture. nostalgebraist.tumblr.com
The challenging part is in getting lots of good training data, and in finding a good training objective.