Edit model card

griffin-llama3t-8L-v0.02-fineweb

Pretraining experiment with griffin/recurrent_gemma arch. This one uses the Llama-3 tokenizer.

Model description

Further training of pszemraj/griffin-1024-llama3t-8layer-simplewiki-silu on the BEE-spoke-data/fineweb-1M_en-med dataset. It achieves the following results on the evaluation set:

  • Loss: 5.6538
  • Accuracy: 0.1881
  • Num Input Tokens Seen: 766509056

evals

tl;dr its bad/would need more training:

hf (pretrained=pszemraj/griffin-llama3t-8L-v0.02-fineweb,trust_remote_code=True,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 4

Tasks Version Filter n-shot Metric Value Stderr
winogrande 1 none 0 acc 0.4964 ± 0.0141
piqa 1 none 0 acc 0.5332 ± 0.0116
none 0 acc_norm 0.5299 ± 0.0116
openbookqa 1 none 0 acc 0.1280 ± 0.0150
none 0 acc_norm 0.2320 ± 0.0189
lambada_openai 1 none 0 perplexity 638060.0702 ± 43608.0044
none 0 acc 0.0000 ± 0.0000
boolq 2 none 0 acc 0.3783 ± 0.0085
arc_easy 1 none 0 acc 0.2614 ± 0.0090
none 0 acc_norm 0.2744 ± 0.0092

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 80085
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-07
  • lr_scheduler_type: inverse_sqrt
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Accuracy Input Tokens Seen
6.4019 0.0684 400 6.7690 0.1278 52428800
6.0547 0.1368 800 6.4214 0.1460 104857600
5.8133 0.2052 1200 6.2566 0.1550 157286400
5.7212 0.2736 1600 6.1411 0.1620 209715200
5.6175 0.3420 2000 6.0502 0.1669 262144000
5.5014 0.4104 2400 5.9827 0.1687 314572800
5.4882 0.4788 2800 5.9203 0.1731 367001600
5.3972 0.5472 3200 5.8614 0.1782 419430400
5.3983 0.6156 3600 5.8340 0.1773 471859200
5.3175 0.6840 4000 5.7916 0.1814 524288000
5.3014 0.7524 4400 5.7565 0.1814 576716800
5.2749 0.8208 4800 5.7303 0.1849 629145600
5.2264 0.8892 5200 5.6993 0.1850 681574400
5.2107 0.9576 5600 5.6745 0.1884 734003200

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
234M params
Tensor type
F32
·

Finetuned from

Dataset used to train pszemraj/griffin-llama3t-8L-v0.02-fineweb