Edit model card

griffin-c3t-8L-v0.02-fineweb

Pretraining experiment with griffin/recurrent_gemma arch

Model description

Further training of pszemraj/griffin-v0.01-c3t-8layer-simplewiki-silu on the BEE-spoke-data/fineweb-1M_en-med dataset. It achieves the following results on the evaluation set:

  • Loss: 5.1888
  • Accuracy: 0.2326
  • Num Input Tokens Seen: 798621696

numbers

tl;dr its bad/would need more training:

hf (pretrained=pszemraj/griffin-c3t-8L-v0.02-fineweb,trust_remote_code=True,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 4

Tasks Version Filter n-shot Metric Value Stderr
winogrande 1 none 0 acc 0.5146 ± 0.0140
piqa 1 none 0 acc 0.5511 ± 0.0116
none 0 acc_norm 0.5261 ± 0.0116
openbookqa 1 none 0 acc 0.1140 ± 0.0142
none 0 acc_norm 0.2240 ± 0.0187
lambada_openai 1 none 0 perplexity 209503.2246 ± 11711.4041
none 0 acc 0.0000 ± 0.0000
boolq 2 none 0 acc 0.3783 ± 0.0085
arc_easy 1 none 0 acc 0.2593 ± 0.0090
none 0 acc_norm 0.2774 ± 0.0092

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 80085
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-07
  • lr_scheduler_type: inverse_sqrt
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Accuracy Input Tokens Seen
6.0703 0.0656 400 6.2332 0.1701 52428800
5.723 0.1313 800 5.9116 0.1893 104857600
5.5106 0.1969 1200 5.7516 0.1976 157286400
5.455 0.2626 1600 5.6427 0.2032 209715200
5.3236 0.3282 2000 5.5567 0.2103 262144000
5.2764 0.3938 2400 5.4919 0.2151 314572800
5.1625 0.4595 2800 5.4436 0.2176 367001600
5.1851 0.5251 3200 5.3975 0.2206 419430400
5.0618 0.5908 3600 5.3624 0.2199 471859200
5.0278 0.6564 4000 5.3242 0.2236 524288000
5.0389 0.7220 4400 5.2920 0.2264 576716800
4.9732 0.7877 4800 5.2674 0.2276 629145600
4.9375 0.8533 5200 5.2418 0.2292 681574400
4.9322 0.9190 5600 5.2166 0.2312 734003200
4.8818 0.9846 6000 5.1981 0.2315 786432000

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
12
Safetensors
Model size
168M params
Tensor type
F32
·

Finetuned from

Dataset used to train pszemraj/griffin-c3t-8L-v0.02-fineweb