GPT-2
Just as good as BERT. But the selling point is Zero-shot Learning.
1.5B parameters (10x larger than GPt)
Next, see GPT-3.
Just as good as BERT. But the selling point is Zero-shot Learning.
1.5B parameters (10x larger than GPt)
Next, see GPT-3.