Runpod Deployment Troubleshooting

#2
by grandignatz - opened

The runpod template installation always gets stuck at:
text_generation_launcher: Download file: model-00006-of-00019.safetensors

We tried multiple Pods (H100, A100, A6000) and everywhere it gets stuck at model part 6.

Runpod Support was not able to help.

Bildschirmfoto 2024-01-05 um 20.42.28.png

Trelis org

Yeah, you can get it to work by reloading the pod a few times, this restarts the downloading from the last shard and eventually you'll get them loaded.

I'm working on pushing 8bit and 4bit models to hub that will reduce download size and speed and maybe side-step the issue. I've already updated the runpod template to download 8bit weights. Testing that now and will get it working on Monday.

Ok, great thank you. I already tried restarting the different pods like 20 times yesterday but never got get past number 6. Will try again.

I tried it again and it seams like it now used the 8 bit branch with only 5 parts. It completed the download but i only get a empty response from the api:
{"generated_text":""}

Server log says:
generate{parameters=GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: false, max_new_tokens: Some(200), return_full_text: None, stop: [], truncate: None, watermark: false, details: false, decoder_input_details: false, seed: None, top_n_tokens: None } total_time="15.697707697s" validation_time="465.588µs" queue_time="86.82µs" inference_time="15.697155549s" time_per_token="78.485777ms" seed="None"}: text_generation_router::server: router/src/server.rs:289: Success

With generate_streame i get:
data:{"token":{"id":0,"text":"","logprob":null,"special":true},"generated_text":null,"details":null}

RonanMcGovern changed discussion title from Issue Runpod Deployment to Runpod Deployment Troubleshooting

Hi folks, some guidance here.

Best Current Approach (use the main branch and then --quantize eetq):

  • I've just set the pod to download the 16-bit weights from the main branch.
  • There are often issues downloading the weights. Downloading gets stuck at various points and requires you to click on the three lines and then "Restart Pod" in a few points. Typically I need to re-start 3-4 times to get all of the weights downloaded. After download, it can take 10-15 mins for the shards to load onto the GPU (at least for an A6000).
  • The Runpod template is the one on the main model card.

Work in Progress #A

  • Ideally, rather than downloading the full 16-bit weights, we would download 8-bit or 4-bit (nf4) weights.
  • However, there is a bug stopping 8 and 4-bit weights being pushed to hub. I have opened issues (4bit, 8bit) and will write back here when I have more updates.

Work in Progress #B

  • I don't know the root cause of the weights getting stuck, but I see that issue with the raw Mixtral model as well. I have posted an issue on that in TGI.

I have been able to get past the model weights being stuck, but once I get the model successfully deployed the generated_text is still empty. Is there a resolution for the response being empty after successful deployment?

Trelis org

Howdy Reed. I've just tested again and the template on the model card is working. For example - using the ADVANCED inference repo - I'm getting:

user: What clothes should I wear? I am in Dublin

function_call: {
    "name": "get_current_weather",
    "arguments": {
        "city": "Dublin"
    }
}

function_response: {
    "temperature": "18 C",
    "condition": "Partly Cloudy"
}

assistant:You should wear a sweater

This matches the YouTube video about Mixtral. I also ran a test with no functions and it ran fine (a speed test as per the youtube video).

Are you using apply_chat_template ? The prompt formatting is crucial.

P.S. I'm working on making an AWQ template now that should be quicker to download.

Trelis org

Ok, the AWQ one-click runpod template is now on the model card. This is now the recommended way to run inference. The model is about 25 GB (instead of ~100 GB) so it will be quicker to download.

I'm therefore closing this issue.

The downloading bug with TGI remains open on their github here.

If you face new issues, just create a new issue - and please provide enough details that I can replicate.

RonanMcGovern changed discussion status to closed

This is working for me now, thanks!

Sign up or log in to comment