Mayank Mishra's picture

Mayank Mishra

mayank-mishra

·

https://mayank31398.github.io/

AI & ML interests

Large Language Models, Distributed Training and Inference

Articles

Aurora-M: The First Open Source Biden-Harris Executive Order Red teamed Multilingual Language Model

Saving Memory Using Padding-Free Transformer Layers during Finetuning

Organizations

Posts 3

Post

1962

Thrilled to unveil DS-MoE: a dense training and sparse inference scheme for enhanced computational and memory efficiency in your MoE models! 🚀🚀🚀

Discover more in our blog: https://huggingface.co/blog/bpan/ds-moe and dive into the details with our paper: Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models (2404.05567)

Post

1737

Current LLMs are very susceptible to generating toxic, harmful and even dangerous content. They can also generate outputs with gender or racial biases.

The Biden-Harris Executive Order (https://www.federalregister.gov/documents/2023/11/01/2023-24283/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence) sets forth guidelines on what is considered a safe AI system.

Following up on these guidelines, we present the world's first open source Biden-Harris Executive Order Red teamed Multilingual Language Model: Aurora-M.

The model is trained on 5 languages: English, Hindi, Japanese, Vietnamese and Finnish.

Blog: https://huggingface.co/blog/mayank-mishra/aurora
Paper coming out soon.

Base model: aurora-m/aurora-m-base (not safety tuned)
Instruct model: aurora-m/aurora-m-instruct (not safety tuned)
Red teamed model: aurora-m/aurora-m-biden-harris-redteamed (safety tuned according to the order mentioned above)

Collections 1

Papers 12

arxiv:2404.05567

arxiv:2404.00399

arxiv:2402.19173

arxiv:2402.02479

models 10

mayank-mishra/slim-pajama-1b-mha

Text Generation • Updated 27 days ago • 8

mayank-mishra/slim-pajama-1b-mqa

Text Generation • Updated Apr 3 • 8

mayank-mishra/slim-pajama-1b-gqa-swiglu

Text Generation • Updated Mar 11 • 8

mayank-mishra/slim-pajama-1b-gqa

Text Generation • Updated Mar 11 • 7

mayank-mishra/starcoder-GPTQ-8bit-128g

Updated May 5, 2023 • 11

mayank-mishra/starcoder-GPTQ-4bit-128g

Updated May 5, 2023 • 16

mayank-mishra/starcoderbase-GPTQ-4bit-128g

Updated May 5, 2023 • 21

mayank-mishra/starcoderbase-GPTQ-8bit-128g

Updated May 5, 2023 • 3

mayank-mishra/santacoder-GPTQ-4bit-128g

Updated May 4, 2023 • 2

mayank-mishra/santacoder-GPTQ-8bit-128g

Updated May 4, 2023 • 1

datasets

None public yet