text-generation-inference (Text Generation Inference)

Text-Generation-Inference is a solution build for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation using Tensor Parallelism and dynamic batching for the most popular open-source LLMs, including StarCoder, BLOOM, GPT-NeoX, Llama, and T5. Text Generation Inference is already used by customers such as IBM, Grammarly, and the Open-Assistant initiative implements optimization for all supported model architectures, including:

Tensor Parallelism and custom cuda kernels
Optimized transformers code for inference using flash-attention and Paged Attention on the most popular architectures
Quantization with bitsandbytes or gptq
Continuous batching of incoming requests for increased total throughput
Accelerated weight loading (start-up time) with safetensors
Logits warpers (temperature scaling, topk, repetition penalty ...)
Watermarking with A Watermark for Large Language Models
Stop sequences, Log probabilities
Token streaming using Server-Sent Events (SSE)

Currently optimized architectures

Check out the source code 👉

the server backend: https://github.com/huggingface/text-generation-inference
the Chat UI: https://huggingface.co/spaces/text-generation-inference/chat-ui

Text Generation Inference

AI & ML interests

Currently optimized architectures

Check out the source code 👉

Check out examples

spaces 2

oasst-sft-1-pythia-12b

chat-ui

models 7

text-generation-inference/Mistral-7B-Instruct-v0.2

text-generation-inference/zephyr-orpo-141b-A35b-v0.1-medusa

text-generation-inference/commandrplus-medusa

text-generation-inference/Mistral-7B-Instruct-v0.2-medusa

text-generation-inference/gemma-7b-it-medusa

text-generation-inference/Nous-Hermes-2-Mixtral-8x7B-DPO-medusa

text-generation-inference/Mixtral-8x7B-Instruct-v0.1-medusa

datasets

AI & ML interests

Team members 6

Currently optimized architectures

Check out the source code 👉

Check out examples

spaces 2 Sort: Recently updated

oasst-sft-1-pythia-12b

chat-ui

models 7 Sort: Recently updated

datasets

spaces 2

models 7