Running on Apple's M1 laptop?

#5
by shiwanlin - opened

Has anyone tried this on M1, well, without CUDA support but I've not reached that far:

python3 generate_openelm.py --model apple/OpenELM-270M-Instruct --hf_access_token hf_XXXXX --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2

...

WARNING:root:No CUDA device detected, using cpu, expect slower speeds.
config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1.30k/1.30k [00:00<00:00, 4.05MB/s]
configuration_openelm.py: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 14.3k/14.3k [00:00<00:00, 38.0MB/s]
A new version of the following files was downloaded from https://huggingface.co/apple/OpenELM-270M-Instruct:

  • configuration_openelm.py
    . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
    modeling_openelm.py: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 39.3k/39.3k [00:00<00:00, 16.3MB/s]
    A new version of the following files was downloaded from https://huggingface.co/apple/OpenELM-270M-Instruct:
  • modeling_openelm.py
    . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
    Traceback (most recent call last):
    File "/Volumes/codes/OpenELM/generate_openelm.py", line 220, in
    output_text, genertaion_time = generate(
    File "/Volumes/codes/OpenELM/generate_openelm.py", line 85, in generate
    model = AutoModelForCausalLM.from_pretrained(
    File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 475, in from_pretrained
    model_class = get_class_from_dynamic_module(
    File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 443, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module.replace(".py", ""))
    File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 164, in get_class_in_module
    module = importlib.import_module(module_path)
    File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/init.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
    File "", line 1050, in _gcd_import
    File "", line 1027, in _find_and_load
    File "", line 1006, in _find_and_load_unlocked
    File "", line 688, in _load_unlocked
    File "", line 883, in exec_module
    File "", line 241, in _call_with_frames_removed
    File "/Users/xxx/.cache/huggingface/modules/transformers_modules/apple/OpenELM-270M-Instruct/1096244b62a03bedc770f8521512fd071f3aa5fd/modeling_openelm.py", line 15, in
    from transformers.cache_utils import Cache, DynamicCache, StaticCache
    ModuleNotFoundError: No module named 'transformers.cache_utils'

It's here it breaks:

For licensing see accompanying LICENSE file.

Copyright (C) 2024 Apple Inc. All Rights Reserved.

from typing import List, Optional, Tuple, Union

import torch
import torch.utils.checkpoint
from torch import Tensor, nn
from torch.nn import CrossEntropyLoss
from torch.nn import functional as F
from transformers import PreTrainedModel
from transformers.activations import ACT2FN
from transformers.cache_utils import Cache, DynamicCache, StaticCache
from transformers.modeling_outputs import (
BaseModelOutputWithPast,
CausalLMOutputWithPast,

I got past the 'transformers' issue by pulling their github & building, and then added "--device mps" which, after installing ~'torch nightly' appears to get past the 'No Cuda Device' warnings, but installing the 3B parameter model resulted in "RuntimeError: MPS backend out of memory (MPS allocated: 9.05 GB, other allocations: 832.00 KB, max allowed: 9.07 GB). Tried to allocate 36.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure)." (Recurring even after setting said flag). Seems to get past that when installing the smaller 1.1B model, but now I am running into permissions issue with tokenizer (see other discussion https://huggingface.co/apple/OpenELM-3B-Instruct/discussions/4).

Thanks for sharing your approach and I will give it a try...

Just an update:

In my case, the transformer problem was fixed with:

pip install transformers --upgrade

The plain update didn't work.

I also have to pass the the meta llama access authorization hurdle too. After that the 270M Instruct works!

A final update: I also got the 3B-Instruct running on the M1 but with 280-290 seconds (!) - the GPU seemed in use to with 40% of load while the CPU occupying ~100% of one core.

Please remove your hugging face token from the comment also.

青对这δΈͺζ€ŽδΉˆεŠžε‘’οΌŸYour request to access model meta-llama/Llama-2-7b-hf is awaiting a review from the repo authors.

@shiwanlin how did you get the GPU to work ? It is not using CUDA

Sign up or log in to comment