Text Generation
Transformers
PyTorch
Safetensors
English
llama
conversational
Eval Results
Inference Endpoints
text-generation-inference

Safetensors version?

#2
by matatonic - opened

Any chance you could also upload a safetensors version?

* Use the following script to convert your local pytorch_model bin files to float16 (you can also choose bfloat16) + safetensors all in one go:

Example to convert WizardLM 70B V1.0 directly to float16 safetensors in 10GB shards:

python convert-to-safetensors.py ~/original/WizardLM-70B-V1.0 --output ~/float16_safetensored/WizardLM-70B-V1.0 --max-shard-size 10GB

Use --bf16 if you'd like to try bfloat16 instead, but note that there are concerns about quantization quality – https://github.com/turboderp/exllamav2/issues/30#issuecomment-1719009289

** Use any one of the following scripts to convert your local pytorch_model bin files to safetensors:

Cognitive Computations org

Thanks but I don't really see why I would want / need to do that

Hi, creator of safetensors here.

The issue with pickle files is that they are not safe. Anyone can write malicious code that's going to be executed on your machine.
You can check: https://huggingface.co/Narsil/totallysafe Weird things will happen as soon as you open this file. And things will continue to be tricky after you have closed your python session.

I promise this is harmless since I actually wrote it, but should give you an impression of how BAD pickle can be.

Users will not know you, therefore not necessarily trust what you output. Having a safetensors file, means at least they should be safe from arbitrary code execution, and the worst that could happen is just loading a problem not fit for what they intend to do.
You can check out more reasons here: https://github.com/huggingface/safetensors#yet-another-format-

For instance it loads files 2x faster than pickle (10x for CPU actually).

Cognitive Computations org

I understand that, but as the author of the the model I know what's in it, and also primarily this is repackaged and quantized by TheBloke, and he and I also trust each other.

I just train models, he packages them for distribution.

I don't see how adding an extra step in the publishing process benefits myself personally.

The person you should talk to is @winglian . If he updates Axolotl so it uses safetensor format by default, then that is what I'll publish.

My original reason for asking was to convert to exl2 quants, which The Bloke doesn't do. I had been unable to convert 70B models so far using the scripts I had (resource limits) - however, the above Panchovix script linked by @Thireus works perfectly and doesn't use crazy resources - Thanks! With that I was able to convert it myself.
I'd add though the HuggingFace build in tools to convert also didn't work on this model, so my thinking is it's a waste for so many other people to convert it using various tools of varying quality, when it could be converted once at the source and quality assured. TheBloke used to release HF fp16 formats for some models, which was handy, but he doesn't do that often anymore.
Regardless, as for my original request I don't need you to provide it anymore.

Cognitive Computations org

Fair enough - I think you might wanna chat with wing though, safetensors would certainly get more adoption if it was default output format in Axolotl

ehartford changed discussion status to closed

Sign up or log in to comment