Minimum requirements for inference

by bilelm - opened May 26, 2023

May 26, 2023

Hello,
Thank you for sharing this model!
Could you specify the minimum requirements needed to run this model for inference ?

DanielHesslow

Technology Innovation Institute org May 26, 2023

It requires at least 14GB of ram, smallest I've tried is an A10 which works well.

It's also trained with bf16 which is only available on Ampere and later, I would expect some performance degradation if running it in fp16 instead.

bilelm

May 26, 2023

Thank you!
I will try it on a A5000 24Go, hope it's ok.

zynos

May 27, 2023

I am running the given code on Windows (CPU with 32 GB RAM) but it keeps running for 2 hours+ without printing the results?
Someone have an idea on how to solve this?

Kamaljp

May 27, 2023

I am trying to load this model on Colab, but it doesn't load in the GPU.
What am I missing? I am using the code provided in the model card and installing the transformers library. Still, model is not loading in GPU.

akashcollectiv

May 28, 2023

•

edited May 28, 2023

i Tried in 40G A100 , worked well , but slow , took about 10min for single input ,
got a 80G A100 , after loading check point , its crashed
return (q * cos) + (rotate_half(q) * sin), (k * cos) + (rotate_half(k) * sin)
```torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 79.32 GiB total capacity; 77.15 GiB already allocated; 832.00 KiB free; 78.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF``
what am i missing ?

FalconLLM

Technology Innovation Institute org May 30, 2023

@zynos it's unlikely you will get anything in a reasonable time on CPU, you really need a GPU for this sort of model.

@akashcollectiv are you sure you are not trying to load Falcon-40B instead? The 7B should fit fine on an A100 80GB.

FalconLLM changed discussion status to closed May 30, 2023

rachelshalom

Jun 4, 2023

all of you are working with pytorch2?

SpiketheCowboy

Jun 6, 2023

•

edited Jun 6, 2023

me too, getting OOM when sequence length exceeds 1200+
using A100 80GB, bf16, and inference only (no_grad) for 7B falcon model
and yes, I'm using pytorch 2.0

i Tried in 40G A100 , worked well , but slow , took about 10min for single input ,
got a 80G A100 , after loading check point , its crashed
return (q * cos) + (rotate_half(q) * sin), (k * cos) + (rotate_half(k) * sin)
```torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 79.32 GiB total capacity; 77.15 GiB already allocated; 832.00 KiB free; 78.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF``
what am i missing ?

Kamaljp

Jun 6, 2023

Have you played with batch size? Halving the batch size seems to help

SpiketheCowboy

Jun 6, 2023

Have you played with batch size? Halving the batch size seems to help

The batch size I run with is 1.

konze

Jun 7, 2023

anyone tell me minimum hardware requirements for the falcon-7b-instruct, i want use if for question answering with given context/Documents data.

pdakin

Jun 9, 2023

On Colab with model card inference call:

  sequences = pipeline(
    "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
      max_length=200,
      do_sample=True,
      top_k=10,
      num_return_sequences=1,
      eos_token_id=tokenizer.eos_token_id,
  )

A100 - 6966.189ms
V100 - 44117.912ms

Seb83

Jun 13, 2023

I am trying to load this model on Colab, but it doesn't load in the GPU.
What am I missing? I am using the code provided in the model card and installing the transformers library. Still, model is not loading in GPU.

@Kamaljp Runtime (heading in top toolbar between insert and tools) -> change runtime type -> select GPU under hardware accelerator

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment