How many active parameters does this model have?

#6
by lewtun HF staff - opened

Does anyone know how many active parameters this model has? Is it a similar calculation to the Mixtral-8x7B model or something new altogether?

Since 2 experts are used in forward pass, it looks like 44B from the name. However, it should be lower. Something around 30B active parameters.

This model has 140620634112 parameters.
Each expert has 3 * (hidden_size * intermediate_size)=301989888 parameters.
The number of active parameters is 140620634112 - 56 * (8 - 2) * 301989888 = 39152031744, which is approximately 39B.

In case you want to check this out, here is a simple code:

from transformers import AutoModelForCausalLM, AutoConfig
config = AutoConfig.from_pretrained("mistral-community/Mixtral-8x22B-v0.1")
from accelerate import init_empty_weights
with init_empty_weights():
    model = AutoModelForCausalLM.from_config(config)
N_total = sum(p.numel() for p in model.parameters())
expert = model.model.layers[0].block_sparse_moe.experts[0]
N_per_expert = sum(p.numel() for p in expert.parameters())
print(N_total - 56 * (8 - 2) * N_per_expert)

Sign up or log in to comment