Effect of tokenization on transformers for biological sequences.
Edo DotanGal JaschekTal PupkoYonatan BelinkovPublished in: Bioinformatics (Oxford, England) (2024)
Code, data, and trained tokenizers are available on https://github.com/technion-cs-nlp/BiologicalTokenizers.