GPT

GPT-2

Just as good as BERT. But the selling point is Zero-shot Learning.

1.5B parameters (10x larger than GPt)

Next, see GPT-3.