Edit model card

ultra-feedback-dutch-cleaned-hq-spin-geitje-7b-ultra-sft_iter1

This model is a fine-tuned version of davidberenstein1957/ultra-feedback-dutch-cleaned-hq-spin-geitje-7b-ultra-sft_iter0 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0380
  • Rewards/real: -5.1867
  • Rewards/generated: -23.6116
  • Rewards/accuracies: 0.9778
  • Rewards/margins: 18.4250
  • Logps/generated: -690.4515
  • Logps/real: -469.2089
  • Logits/generated: -1.6815
  • Logits/real: -2.1280

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/real Rewards/generated Rewards/accuracies Rewards/margins Logps/generated Logps/real Logits/generated Logits/real
0.591 0.04 25 0.4210 -0.2501 -1.0788 0.8500 0.8287 -465.1227 -419.8426 -2.6984 -2.7096
0.2223 0.08 50 0.2173 -0.5659 -3.0876 0.9176 2.5217 -485.2113 -423.0011 -2.6306 -2.6446
0.168 0.12 75 0.1532 -0.7060 -4.4771 0.9435 3.7711 -499.1060 -424.4022 -2.5832 -2.6005
0.1126 0.16 100 0.1218 -1.2746 -6.3162 0.9509 5.0415 -517.4969 -430.0886 -2.5961 -2.6118
0.0854 0.21 125 0.0921 -1.7944 -9.0378 0.9611 7.2433 -544.7130 -435.2866 -2.5534 -2.5859
0.0609 0.25 150 0.0738 -1.6860 -9.1926 0.9639 7.5065 -546.2610 -434.2025 -2.5875 -2.6239
0.0654 0.29 175 0.0733 -2.0360 -9.8189 0.9648 7.7828 -552.5237 -437.7025 -2.5252 -2.5698
0.0814 0.33 200 0.0714 -2.3341 -10.2294 0.9630 7.8952 -556.6287 -440.6832 -2.4634 -2.5260
0.0356 0.37 225 0.0698 -2.6697 -11.4164 0.9667 8.7467 -568.4990 -444.0394 -2.4311 -2.5142
0.0641 0.41 250 0.0586 -2.3926 -12.3053 0.9694 9.9126 -577.3877 -441.2684 -2.3106 -2.4202
0.0442 0.45 275 0.0672 -2.5170 -11.9462 0.9676 9.4293 -573.7975 -442.5117 -2.3880 -2.4773
0.0707 0.49 300 0.0540 -3.8488 -15.1469 0.9667 11.2982 -605.8044 -455.8299 -2.2564 -2.3913
0.0683 0.53 325 0.0574 -5.2977 -18.2377 0.9667 12.9400 -636.7123 -470.3190 -2.1402 -2.3222
0.0339 0.58 350 0.0495 -3.7486 -17.2926 0.9731 13.5439 -627.2608 -454.8286 -2.1701 -2.3731
0.0648 0.62 375 0.0537 -2.4302 -13.2604 0.9722 10.8301 -586.9390 -441.6444 -2.3167 -2.4783
0.0358 0.66 400 0.0460 -3.8509 -17.3389 0.9741 13.4880 -627.7241 -455.8509 -2.1735 -2.3874
0.0532 0.7 425 0.0483 -4.3261 -18.2030 0.9741 13.8769 -636.3655 -460.6029 -2.1550 -2.3751
0.0408 0.74 450 0.0567 -4.8885 -19.7272 0.9741 14.8387 -651.6073 -466.2276 -2.2982 -2.4811
0.0434 0.78 475 0.0467 -2.8677 -16.1120 0.9731 13.2443 -615.4548 -446.0187 -2.1937 -2.4242
0.0194 0.82 500 0.0455 -3.2473 -18.4707 0.9769 15.2234 -639.0422 -449.8151 -2.0107 -2.3291
0.0227 0.86 525 0.0543 -4.5805 -20.1131 0.9750 15.5326 -655.4664 -463.1471 -2.2146 -2.4100
0.0299 0.91 550 0.0481 -4.3021 -20.3869 0.9731 16.0848 -658.2037 -460.3627 -2.0552 -2.3301
0.0218 0.95 575 0.0464 -4.4619 -20.3587 0.9713 15.8967 -657.9220 -461.9616 -1.9225 -2.2635
0.0218 0.99 600 0.0451 -5.3210 -20.9811 0.9722 15.6602 -664.1465 -470.5517 -1.9518 -2.2964
0.0093 1.03 625 0.0429 -4.3395 -19.2716 0.9750 14.9321 -647.0515 -460.7374 -1.7575 -2.1708
0.0173 1.07 650 0.0492 -4.1317 -19.0745 0.9704 14.9428 -645.0802 -458.6593 -1.8155 -2.1757
0.0059 1.11 675 0.0449 -5.7336 -23.1577 0.9713 17.4241 -685.9126 -474.6784 -1.6844 -2.1123
0.0149 1.15 700 0.0608 -7.1484 -26.1989 0.9713 19.0504 -716.3237 -488.8266 -2.0142 -2.2748
0.0105 1.19 725 0.0479 -4.4948 -20.2513 0.9722 15.7564 -656.8477 -462.2903 -2.1674 -2.3962
0.032 1.23 750 0.0512 -5.0950 -21.3230 0.9685 16.2280 -667.5649 -468.2917 -2.2426 -2.4414
0.0042 1.28 775 0.0462 -4.0296 -19.2620 0.9704 15.2324 -646.9548 -457.6381 -2.2156 -2.4379
0.0041 1.32 800 0.0475 -4.0348 -19.8410 0.9731 15.8062 -652.7453 -457.6903 -2.1330 -2.3843
0.0075 1.36 825 0.0428 -4.4696 -20.8584 0.9722 16.3888 -662.9192 -462.0378 -2.1122 -2.3718
0.004 1.4 850 0.0468 -6.2822 -25.6273 0.9750 19.3451 -710.6078 -480.1642 -1.7240 -2.1709
0.0222 1.44 875 0.0584 -6.0399 -23.0778 0.9759 17.0379 -685.1132 -477.7408 -1.6544 -2.1242
0.0063 1.48 900 0.0490 -3.8721 -19.8020 0.9722 15.9298 -652.3550 -456.0635 -1.7696 -2.2026
0.006 1.52 925 0.0478 -5.2822 -23.7504 0.9750 18.4682 -691.8392 -470.1639 -1.6461 -2.1239
0.0169 1.56 950 0.0455 -4.9375 -22.9431 0.9731 18.0057 -683.7665 -466.7169 -1.6890 -2.1447
0.0063 1.6 975 0.0449 -5.9782 -25.0564 0.9741 19.0782 -704.8994 -477.1242 -1.5890 -2.0779
0.0144 1.65 1000 0.0428 -5.2622 -22.9304 0.9731 17.6682 -683.6391 -469.9639 -1.6262 -2.0859
0.0046 1.69 1025 0.0411 -5.5146 -24.0845 0.9759 18.5698 -695.1800 -472.4886 -1.6070 -2.0934
0.002 1.73 1050 0.0408 -5.4174 -23.7610 0.9750 18.3436 -691.9457 -471.5163 -1.6779 -2.1277
0.0047 1.77 1075 0.0411 -5.6837 -24.5512 0.9750 18.8674 -699.8467 -474.1796 -1.7048 -2.1412
0.0077 1.81 1100 0.0404 -5.8712 -25.3478 0.9759 19.4766 -707.8129 -476.0543 -1.6257 -2.0917
0.0145 1.85 1125 0.0385 -5.0758 -23.2450 0.9741 18.1692 -686.7853 -468.0999 -1.6509 -2.1029
0.0038 1.89 1150 0.0376 -5.2077 -23.5236 0.9759 18.3159 -689.5715 -469.4194 -1.6736 -2.1249
0.01 1.93 1175 0.0379 -5.1247 -23.3484 0.9750 18.2238 -687.8193 -468.5888 -1.6969 -2.1383
0.0055 1.98 1200 0.0380 -5.1867 -23.6116 0.9778 18.4250 -690.4515 -469.2089 -1.6815 -2.1280

Framework versions

  • Transformers 4.37.0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
88
Safetensors
Model size
7.24B params
Tensor type
BF16
·

Finetuned from