Running on Apple's M1 laptop?

by shiwanlin - opened Apr 26

Apr 26

•

Has anyone tried this on M1, well, without CUDA support but I've not reached that far:

python3 generate_openelm.py --model apple/OpenELM-270M-Instruct --hf_access_token hf_XXXXX --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2

...

WARNING:root:No CUDA device detected, using cpu, expect slower speeds.
config.json: 100%|████████████████████████████████████████████████████████████████████████████| 1.30k/1.30k [00:00<00:00, 4.05MB/s]
configuration_openelm.py: 100%|███████████████████████████████████████████████████████████████| 14.3k/14.3k [00:00<00:00, 38.0MB/s]
A new version of the following files was downloaded from https://huggingface.co/apple/OpenELM-270M-Instruct:

configuration_openelm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
modeling_openelm.py: 100%|████████████████████████████████████████████████████████████████████| 39.3k/39.3k [00:00<00:00, 16.3MB/s]
A new version of the following files was downloaded from https://huggingface.co/apple/OpenELM-270M-Instruct:
modeling_openelm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Traceback (most recent call last):
File "/Volumes/codes/OpenELM/generate_openelm.py", line 220, in
output_text, genertaion_time = generate(
File "/Volumes/codes/OpenELM/generate_openelm.py", line 85, in generate
model = AutoModelForCausalLM.from_pretrained(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 475, in from_pretrained
model_class = get_class_from_dynamic_module(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 443, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 164, in get_class_in_module
module = importlib.import_module(module_path)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/Users/xxx/.cache/huggingface/modules/transformers_modules/apple/OpenELM-270M-Instruct/1096244b62a03bedc770f8521512fd071f3aa5fd/modeling_openelm.py", line 15, in
from transformers.cache_utils import Cache, DynamicCache, StaticCache
ModuleNotFoundError: No module named 'transformers.cache_utils'

It's here it breaks:

For licensing see accompanying LICENSE file.

Copyright (C) 2024 Apple Inc. All Rights Reserved.

from typing import List, Optional, Tuple, Union

import torch
import torch.utils.checkpoint
from torch import Tensor, nn
from torch.nn import CrossEntropyLoss
from torch.nn import functional as F
from transformers import PreTrainedModel
from transformers.activations import ACT2FN
from transformers.cache_utils import Cache, DynamicCache, StaticCache
from transformers.modeling_outputs import (
BaseModelOutputWithPast,
CausalLMOutputWithPast,

dakerholdings

Apr 26

I got past the 'transformers' issue by pulling their github & building, and then added "--device mps" which, after installing ~'torch nightly' appears to get past the 'No Cuda Device' warnings, but installing the 3B parameter model resulted in "RuntimeError: MPS backend out of memory (MPS allocated: 9.05 GB, other allocations: 832.00 KB, max allowed: 9.07 GB). Tried to allocate 36.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure)." (Recurring even after setting said flag). Seems to get past that when installing the smaller 1.1B model, but now I am running into permissions issue with tokenizer (see other discussion https://huggingface.co/apple/OpenELM-3B-Instruct/discussions/4).

shiwanlin

Apr 26

Thanks for sharing your approach and I will give it a try...

shiwanlin

Apr 26

Just an update:

In my case, the transformer problem was fixed with:

pip install transformers --upgrade

The plain update didn't work.

I also have to pass the the meta llama access authorization hurdle too. After that the 270M Instruct works!

shiwanlin

Apr 27

A final update: I also got the 3B-Instruct running on the M1 but with 280-290 seconds (!) - the GPU seemed in use to with 40% of load while the CPU occupying ~100% of one core.

hfprashant480

Apr 30

Please remove your hugging face token from the comment also.

qizhang2004

6 days ago

面对这个怎么办呢？Your request to access model meta-llama/Llama-2-7b-hf is awaiting a review from the repo authors.

chrisau168

3 days ago

•

edited 3 days ago

@shiwanlin how did you get the GPU to work ? It is not using CUDA

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment