bpe
- Japanese and Korean voice search - https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37842.pdf - Proposes wordpieces, 2012 
- Neural Machine Translation of Rare Words with Subword Units - https://arxiv.org/abs/1508.07909 - 2016 
- Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates 
- Sentencepiece: A Simple and Language Independent Subword Tokenizer and Detokenizer for Neural Text Processing