Tokenizer SentencePiece SentencePiece implements subword units (e.g., byte-pair-encoding (BPE) and Unigram Language Model). Made by Google.