Minimum requirements for inference

#2
by bilelm - opened

Hello,
Thank you for sharing this model!
Could you specify the minimum requirements needed to run this model for inference ?

Technology Innovation Institute org

It requires at least 14GB of ram, smallest I've tried is an A10 which works well.

It's also trained with bf16 which is only available on Ampere and later, I would expect some performance degradation if running it in fp16 instead.

Thank you!
I will try it on a A5000 24Go, hope it's ok.

I am running the given code on Windows (CPU with 32 GB RAM) but it keeps running for 2 hours+ without printing the results?
Someone have an idea on how to solve this?

I am trying to load this model on Colab, but it doesn't load in the GPU.
What am I missing? I am using the code provided in the model card and installing the transformers library. Still, model is not loading in GPU.

i Tried in 40G A100 , worked well , but slow , took about 10min for single input ,
got a 80G A100 , after loading check point , its crashed
return (q * cos) + (rotate_half(q) * sin), (k * cos) + (rotate_half(k) * sin)
```torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 79.32 GiB total capacity; 77.15 GiB already allocated; 832.00 KiB free; 78.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF``
what am i missing ?

Technology Innovation Institute org

@zynos it's unlikely you will get anything in a reasonable time on CPU, you really need a GPU for this sort of model.

@akashcollectiv are you sure you are not trying to load Falcon-40B instead? The 7B should fit fine on an A100 80GB.

FalconLLM changed discussion status to closed

all of you are working with pytorch2?

me too, getting OOM when sequence length exceeds 1200+
using A100 80GB, bf16, and inference only (no_grad) for 7B falcon model
and yes, I'm using pytorch 2.0

i Tried in 40G A100 , worked well , but slow , took about 10min for single input ,
got a 80G A100 , after loading check point , its crashed
return (q * cos) + (rotate_half(q) * sin), (k * cos) + (rotate_half(k) * sin)
```torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 79.32 GiB total capacity; 77.15 GiB already allocated; 832.00 KiB free; 78.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF``
what am i missing ?

Have you played with batch size? Halving the batch size seems to help

Have you played with batch size? Halving the batch size seems to help

The batch size I run with is 1.

anyone tell me minimum hardware requirements for the falcon-7b-instruct, i want use if for question answering with given context/Documents data.

On Colab with model card inference call:

  sequences = pipeline(
    "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
      max_length=200,
      do_sample=True,
      top_k=10,
      num_return_sequences=1,
      eos_token_id=tokenizer.eos_token_id,
  )

A100 - 6966.189ms
V100 - 44117.912ms

I am trying to load this model on Colab, but it doesn't load in the GPU.
What am I missing? I am using the code provided in the model card and installing the transformers library. Still, model is not loading in GPU.

@Kamaljp Runtime (heading in top toolbar between insert and tools) -> change runtime type -> select GPU under hardware accelerator

Sign up or log in to comment