Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Arcade100kTokenizer

Arcade100k is a BPE tokenizer extended from OpenAI’s tiktoken.cl100k_base to include special tokens for code and individual digit-splitting.

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("stabilityai/arcade100k", trust_remote_code=True)
tokenizer("hello, world!", return_tensors='pt')

Citation

@article{bellagente2024stable,
  title={Stable LM 2 1.6 B Technical Report},
  author={Bellagente, Marco and Tow, Jonathan and Mahan, Dakota and Phung, Duy and Zhuravinskyi, Maksym and Adithyan, Reshinth and Baicoianu, James and Brooks, Ben and Cooper, Nathan and Datta, Ashish and others},
  journal={arXiv preprint arXiv:2402.17834},
  year={2024}
}
Downloads last month
0
Unable to determine this model's library. Check the docs .