tiktoken

This is what openAI uses. It uses Byte Pair Encoding under the hood. Essentially, you start with some base tokenization scheme will a very small vocabulary (256 tokens, each char is a byte). Then, you iteratively merge the most common tokens.

Gpt-2 has a vocabulary of ~50k different tokens.

Gpt-4 base model uses 100k different tokens.