philschmid HF staff commited on
Commit
b1b51d0
1 Parent(s): 94d995f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -10,7 +10,7 @@ pinned: false
10
  Text-Generation-Inference is, an open-source, purpose-built solution for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation using Tensor Parallelism and dynamic batching for the most popular open-source LLMs, including StarCoder, BLOOM, GPT-NeoX, Llama, and T5. Text Generation Inference is already used by customers such as IBM, Grammarly, and the Open-Assistant initiative implements optimization for all supported model architectures, including:
11
 
12
  - Tensor Parallelism and custom cuda kernels
13
- - OOptimized transformers code for inference using flash-attention and Paged Attention on the most popular architectures
14
  - Quantization with bitsandbytes or gptq
15
  - Continuous batching of incoming requests for increased total throughput
16
  - Accelerated weight loading (start-up time) with safetensors
 
10
  Text-Generation-Inference is, an open-source, purpose-built solution for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation using Tensor Parallelism and dynamic batching for the most popular open-source LLMs, including StarCoder, BLOOM, GPT-NeoX, Llama, and T5. Text Generation Inference is already used by customers such as IBM, Grammarly, and the Open-Assistant initiative implements optimization for all supported model architectures, including:
11
 
12
  - Tensor Parallelism and custom cuda kernels
13
+ - Optimized transformers code for inference using flash-attention and Paged Attention on the most popular architectures
14
  - Quantization with bitsandbytes or gptq
15
  - Continuous batching of incoming requests for increased total throughput
16
  - Accelerated weight loading (start-up time) with safetensors