Edit model card

tinyllama-1.1b-sum-dpo-full

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6342
  • Rewards/chosen: -1.8568
  • Rewards/rejected: -2.3204
  • Rewards/accuracies: 0.6580
  • Rewards/margins: 0.4635
  • Logps/rejected: -295.1929
  • Logps/chosen: -244.3875
  • Logits/rejected: -1.3920
  • Logits/chosen: -1.4190

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6929 0.02 100 0.6932 -0.0000 0.0000 0.4986 -0.0000 -63.1568 -58.7055 -3.1598 -3.1655
0.693 0.03 200 0.6931 0.0002 0.0002 0.5128 0.0000 -63.1375 -58.6803 -3.1596 -3.1653
0.6926 0.05 300 0.6930 0.0006 0.0003 0.5395 0.0003 -63.1272 -58.6442 -3.1581 -3.1638
0.691 0.07 400 0.6926 0.0014 0.0004 0.5611 0.0010 -63.1156 -58.5606 -3.1547 -3.1603
0.6907 0.09 500 0.6921 0.0021 -0.0000 0.5755 0.0021 -63.1621 -58.4974 -3.1459 -3.1515
0.6852 0.1 600 0.6915 0.0010 -0.0025 0.5822 0.0035 -63.4056 -58.6003 -3.1331 -3.1388
0.6854 0.12 700 0.6905 -0.0024 -0.0080 0.5895 0.0056 -63.9547 -58.9453 -3.1150 -3.1207
0.6829 0.14 800 0.6887 -0.0198 -0.0294 0.5734 0.0097 -66.0990 -60.6796 -3.0887 -3.0944
0.6773 0.16 900 0.6863 -0.0499 -0.0651 0.5929 0.0152 -69.6642 -63.6925 -3.0513 -3.0570
0.6818 0.17 1000 0.6837 -0.0860 -0.1071 0.5971 0.0212 -73.8714 -67.3013 -3.0031 -3.0087
0.6715 0.19 1100 0.6800 -0.1307 -0.1606 0.6057 0.0300 -79.2216 -71.7704 -2.9405 -2.9461
0.6651 0.21 1200 0.6756 -0.1933 -0.2344 0.5997 0.0411 -86.5957 -78.0297 -2.8460 -2.8516
0.663 0.22 1300 0.6691 -0.2828 -0.3409 0.6171 0.0581 -97.2443 -86.9854 -2.7796 -2.7856
0.6329 0.24 1400 0.6610 -0.3769 -0.4582 0.6185 0.0813 -108.9814 -96.3935 -2.6744 -2.6805
0.6356 0.26 1500 0.6537 -0.4858 -0.5921 0.6380 0.1063 -122.3668 -107.2818 -2.5109 -2.5177
0.6275 0.28 1600 0.6452 -0.5829 -0.7205 0.6364 0.1376 -135.2118 -116.9967 -2.4086 -2.4171
0.6315 0.29 1700 0.6434 -0.5896 -0.7345 0.6336 0.1449 -136.6092 -117.6634 -2.3275 -2.3370
0.6166 0.31 1800 0.6394 -0.7915 -0.9646 0.6289 0.1731 -159.6184 -137.8539 -2.0875 -2.0994
0.6238 0.33 1900 0.6394 -0.9314 -1.1088 0.6280 0.1774 -174.0358 -151.8405 -1.9646 -1.9768
0.5824 0.34 2000 0.6345 -0.9755 -1.1825 0.6338 0.2070 -181.4065 -156.2569 -1.9742 -1.9884
0.5895 0.36 2100 0.6449 -0.6585 -0.8078 0.6338 0.1493 -143.9416 -124.5552 -1.9401 -1.9533
0.5633 0.38 2200 0.6434 -0.6348 -0.7894 0.6248 0.1546 -142.1007 -122.1877 -1.9416 -1.9549
0.5459 0.4 2300 0.6320 -1.0811 -1.3378 0.6301 0.2566 -196.9343 -166.8161 -1.6992 -1.7183
0.5786 0.41 2400 0.6306 -1.1984 -1.4632 0.6292 0.2649 -209.4779 -178.5388 -1.6168 -1.6363
0.5679 0.43 2500 0.6330 -0.9020 -1.1230 0.6345 0.2210 -175.4528 -148.9024 -1.7044 -1.7220
0.5426 0.45 2600 0.6352 -0.8874 -1.0910 0.6355 0.2037 -172.2623 -147.4389 -1.7825 -1.7993
0.5888 0.47 2700 0.6303 -0.9094 -1.1295 0.6452 0.2201 -176.1057 -149.6399 -1.8294 -1.8467
0.6328 0.48 2800 0.6316 -0.8366 -1.0424 0.6420 0.2058 -167.4005 -142.3680 -1.8252 -1.8423
0.5746 0.5 2900 0.6267 -1.0547 -1.3045 0.6443 0.2499 -193.6111 -164.1712 -1.7076 -1.7276
0.5452 0.52 3000 0.6288 -0.9206 -1.1541 0.6464 0.2335 -178.5692 -150.7609 -1.7363 -1.7558
0.5525 0.53 3100 0.6231 -1.0317 -1.3080 0.6564 0.2763 -193.9615 -161.8740 -1.6101 -1.6323
0.6097 0.55 3200 0.6201 -1.0912 -1.3708 0.6554 0.2796 -200.2384 -167.8213 -1.5903 -1.6121
0.5807 0.57 3300 0.6239 -1.1017 -1.3657 0.6506 0.2640 -199.7250 -168.8761 -1.5292 -1.5503
0.536 0.59 3400 0.6312 -0.8275 -1.0409 0.6466 0.2134 -167.2509 -141.4572 -1.7056 -1.7240
0.5392 0.6 3500 0.6287 -1.0262 -1.2804 0.6466 0.2542 -191.1944 -161.3248 -1.6386 -1.6596
0.5689 0.62 3600 0.6275 -1.1210 -1.3785 0.6487 0.2574 -201.0063 -170.8087 -1.6286 -1.6494
0.517 0.64 3700 0.6244 -1.2262 -1.5240 0.6566 0.2979 -215.5612 -181.3195 -1.4999 -1.5238
0.5368 0.65 3800 0.6207 -1.2368 -1.5309 0.6580 0.2941 -216.2485 -182.3809 -1.5010 -1.5237
0.5382 0.67 3900 0.6221 -1.0150 -1.2770 0.6596 0.2620 -190.8593 -160.2047 -1.6362 -1.6580
0.5399 0.69 4000 0.6212 -1.1703 -1.4644 0.6599 0.2941 -209.6013 -175.7381 -1.4870 -1.5105
0.5175 0.71 4100 0.6203 -1.2765 -1.5905 0.6554 0.3140 -222.2049 -186.3498 -1.4476 -1.4722
0.5803 0.72 4200 0.6208 -1.3529 -1.6862 0.6624 0.3332 -231.7760 -193.9977 -1.4322 -1.4581
0.507 0.74 4300 0.6265 -0.9361 -1.1863 0.6624 0.2501 -181.7826 -152.3180 -1.5738 -1.5955
0.5273 0.76 4400 0.6211 -1.2719 -1.6087 0.6687 0.3368 -224.0267 -185.8899 -1.4048 -1.4308
0.5574 0.78 4500 0.6233 -1.1065 -1.4002 0.6671 0.2937 -203.1787 -169.3536 -1.4729 -1.4964
0.4819 0.79 4600 0.6219 -1.1036 -1.4017 0.6643 0.2981 -203.3253 -169.0589 -1.5017 -1.5251
0.5187 0.81 4700 0.6172 -1.4659 -1.8338 0.6654 0.3680 -246.5411 -205.2918 -1.3670 -1.3935
0.5805 0.83 4800 0.6146 -1.4235 -1.7810 0.6619 0.3575 -241.2558 -201.0503 -1.4196 -1.4453
0.537 0.84 4900 0.6194 -1.2089 -1.5178 0.6557 0.3089 -214.9402 -179.5929 -1.5222 -1.5460
0.5112 0.86 5000 0.6177 -1.5091 -1.8730 0.6580 0.3638 -250.4540 -209.6180 -1.4013 -1.4276
0.5746 0.88 5100 0.6200 -1.2224 -1.5393 0.6654 0.3168 -217.0836 -180.9476 -1.5328 -1.5572
0.5138 0.9 5200 0.6237 -1.0419 -1.3187 0.6605 0.2768 -195.0258 -162.8902 -1.6006 -1.6232
0.5094 0.91 5300 0.6181 -1.2868 -1.6160 0.6599 0.3293 -224.7612 -187.3815 -1.5180 -1.5428
0.4865 0.93 5400 0.6222 -1.2264 -1.5437 0.6698 0.3173 -217.5302 -181.3466 -1.5197 -1.5443
0.513 0.95 5500 0.6214 -1.1371 -1.4265 0.6722 0.2894 -205.8068 -172.4182 -1.5651 -1.5876
0.5474 0.96 5600 0.6201 -1.1854 -1.4951 0.6689 0.3097 -212.6680 -177.2486 -1.5109 -1.5347
0.5291 0.98 5700 0.6191 -1.1659 -1.4788 0.6696 0.3130 -211.0420 -175.2930 -1.5209 -1.5449
0.496 1.0 5800 0.6148 -1.5172 -1.9032 0.6680 0.3860 -253.4752 -210.4265 -1.4163 -1.4435
0.3739 1.02 5900 0.6216 -1.5454 -1.9612 0.6626 0.4157 -259.2733 -213.2480 -1.3429 -1.3716
0.3835 1.03 6000 0.6214 -1.8273 -2.3125 0.6671 0.4851 -294.4050 -241.4372 -1.2869 -1.3177
0.3822 1.05 6100 0.6230 -2.0009 -2.5009 0.6710 0.4999 -313.2448 -258.7976 -1.2163 -1.2471
0.4249 1.07 6200 0.6216 -1.5166 -1.9264 0.6657 0.4098 -255.7980 -210.3597 -1.4188 -1.4463
0.4731 1.09 6300 0.6206 -1.7045 -2.1531 0.6654 0.4486 -278.4628 -229.1491 -1.3768 -1.4055
0.4089 1.1 6400 0.6263 -1.9433 -2.4330 0.6643 0.4897 -306.4561 -253.0356 -1.2985 -1.3283
0.4055 1.12 6500 0.6263 -1.6156 -2.0285 0.6657 0.4128 -266.0024 -220.2685 -1.4228 -1.4496
0.4373 1.14 6600 0.6319 -1.9163 -2.3889 0.6615 0.4726 -302.0515 -250.3334 -1.3870 -1.4154
0.4568 1.15 6700 0.6347 -1.7086 -2.1521 0.6575 0.4435 -278.3696 -229.5625 -1.4138 -1.4419
0.396 1.17 6800 0.6304 -1.8382 -2.2997 0.6694 0.4614 -293.1244 -242.5259 -1.3792 -1.4074
0.4312 1.19 6900 0.6330 -2.0759 -2.5709 0.6645 0.4950 -320.2516 -266.2965 -1.3565 -1.3853
0.4144 1.21 7000 0.6300 -1.5474 -1.9476 0.6587 0.4001 -257.9128 -213.4480 -1.5128 -1.5385
0.4501 1.22 7100 0.6320 -1.5691 -1.9654 0.6510 0.3963 -259.6932 -215.6143 -1.4579 -1.4834
0.4303 1.24 7200 0.6323 -1.7741 -2.2060 0.6538 0.4319 -283.7571 -236.1103 -1.4104 -1.4369
0.4717 1.26 7300 0.6294 -1.8573 -2.3122 0.6668 0.4549 -294.3745 -244.4295 -1.3985 -1.4254
0.3908 1.27 7400 0.6307 -1.6832 -2.1090 0.6568 0.4258 -274.0572 -227.0262 -1.4235 -1.4501
0.4618 1.29 7500 0.6276 -1.5299 -1.9160 0.6531 0.3861 -254.7590 -211.6911 -1.4812 -1.5060
0.5019 1.31 7600 0.6301 -1.8422 -2.2951 0.6624 0.4529 -292.6649 -242.9215 -1.4008 -1.4277
0.4239 1.33 7700 0.6266 -1.6098 -2.0240 0.6633 0.4142 -265.5571 -219.6812 -1.4540 -1.4801
0.4156 1.34 7800 0.6327 -1.9969 -2.4832 0.6638 0.4864 -311.4807 -258.3907 -1.3619 -1.3900
0.418 1.36 7900 0.6321 -1.7670 -2.2060 0.6578 0.4391 -283.7597 -235.3999 -1.4207 -1.4475
0.4084 1.38 8000 0.6318 -1.8853 -2.3451 0.6638 0.4598 -297.6674 -247.2307 -1.3816 -1.4088
0.4616 1.4 8100 0.6337 -1.6779 -2.0977 0.6564 0.4198 -272.9300 -226.4922 -1.4319 -1.4581
0.4033 1.41 8200 0.6331 -1.8711 -2.3312 0.6638 0.4601 -296.2737 -245.8150 -1.3845 -1.4116
0.4659 1.43 8300 0.6338 -1.9457 -2.4103 0.6643 0.4646 -304.1916 -253.2738 -1.3745 -1.4014
0.4254 1.45 8400 0.6342 -1.7488 -2.1805 0.6589 0.4317 -281.2074 -233.5818 -1.4272 -1.4531
0.4177 1.46 8500 0.6338 -1.7052 -2.1243 0.6589 0.4190 -275.5844 -229.2278 -1.4477 -1.4731
0.4537 1.48 8600 0.6325 -1.8512 -2.2974 0.6678 0.4461 -292.8940 -243.8274 -1.4197 -1.4457
0.4176 1.5 8700 0.6308 -1.7305 -2.1647 0.6654 0.4342 -279.6241 -231.7505 -1.4491 -1.4751
0.4486 1.52 8800 0.6291 -1.7428 -2.1782 0.6694 0.4354 -280.9822 -232.9864 -1.4555 -1.4813
0.3594 1.53 8900 0.6299 -1.9280 -2.3996 0.6675 0.4716 -303.1151 -251.5025 -1.4002 -1.4271
0.4428 1.55 9000 0.6319 -1.8919 -2.3581 0.6643 0.4663 -298.9696 -247.8895 -1.4093 -1.4361
0.4441 1.57 9100 0.6315 -1.7822 -2.2239 0.6671 0.4418 -285.5493 -236.9199 -1.4335 -1.4596
0.3898 1.59 9200 0.6316 -1.7689 -2.2103 0.6657 0.4414 -284.1919 -235.5972 -1.4175 -1.4437
0.3657 1.6 9300 0.6326 -1.8070 -2.2549 0.6638 0.4480 -288.6493 -239.3994 -1.4099 -1.4361
0.4666 1.62 9400 0.6325 -1.7984 -2.2467 0.6631 0.4483 -287.8304 -238.5475 -1.4113 -1.4377
0.3503 1.64 9500 0.6340 -1.9330 -2.4089 0.6587 0.4759 -304.0439 -252.0053 -1.3757 -1.4028
0.3729 1.65 9600 0.6357 -1.9359 -2.4150 0.6564 0.4791 -304.6583 -252.2943 -1.3641 -1.3914
0.4403 1.67 9700 0.6342 -1.8602 -2.3254 0.6624 0.4652 -295.6944 -244.7219 -1.3903 -1.4172
0.3633 1.69 9800 0.6346 -1.8563 -2.3208 0.6589 0.4644 -295.2367 -244.3386 -1.3928 -1.4199
0.3727 1.71 9900 0.6336 -1.8765 -2.3444 0.6557 0.4679 -297.6013 -246.3585 -1.3978 -1.4249
0.424 1.72 10000 0.6344 -1.8698 -2.3349 0.6515 0.4650 -296.6436 -245.6855 -1.3958 -1.4226
0.3867 1.74 10100 0.6348 -1.8396 -2.2973 0.6610 0.4578 -292.8903 -242.6608 -1.4014 -1.4282
0.3851 1.76 10200 0.6358 -1.9589 -2.4446 0.6608 0.4858 -307.6222 -254.5927 -1.3697 -1.3974
0.4322 1.77 10300 0.6352 -1.9333 -2.4122 0.6585 0.4788 -304.3728 -252.0376 -1.3729 -1.4002
0.3405 1.79 10400 0.6352 -1.8857 -2.3538 0.6608 0.4681 -298.5337 -247.2695 -1.3844 -1.4115
0.424 1.81 10500 0.6351 -1.8775 -2.3439 0.6599 0.4665 -297.5495 -246.4502 -1.3843 -1.4113
0.4396 1.83 10600 0.6350 -1.8749 -2.3405 0.6568 0.4655 -297.2035 -246.1965 -1.3876 -1.4146
0.3908 1.84 10700 0.6334 -1.8434 -2.3045 0.6564 0.4611 -293.6068 -243.0424 -1.3944 -1.4212
0.4273 1.86 10800 0.6342 -1.8539 -2.3151 0.6624 0.4611 -294.6657 -244.0978 -1.3926 -1.4194
0.3762 1.88 10900 0.6346 -1.8597 -2.3213 0.6566 0.4616 -295.2873 -244.6704 -1.3904 -1.4173
0.4734 1.9 11000 0.6339 -1.8518 -2.3137 0.6629 0.4619 -294.5248 -243.8795 -1.3920 -1.4190
0.4333 1.91 11100 0.6333 -1.8546 -2.3184 0.6599 0.4638 -294.9983 -244.1649 -1.3921 -1.4190
0.4305 1.93 11200 0.6335 -1.8468 -2.3074 0.6564 0.4606 -293.8987 -243.3866 -1.3953 -1.4221
0.4817 1.95 11300 0.6343 -1.8562 -2.3189 0.6573 0.4627 -295.0477 -244.3265 -1.3934 -1.4203
0.4146 1.96 11400 0.6339 -1.8573 -2.3207 0.6559 0.4634 -295.2255 -244.4316 -1.3909 -1.4179
0.432 1.98 11500 0.6337 -1.8547 -2.3184 0.6536 0.4637 -295.0010 -244.1783 -1.3914 -1.4183
0.429 2.0 11600 0.6342 -1.8568 -2.3204 0.6580 0.4635 -295.1929 -244.3875 -1.3920 -1.4190

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
16
Safetensors
Model size
1.1B params
Tensor type
BF16
·

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full