|
2023-10-13 18:40:22,179 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:40:22,181 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=13, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-13 18:40:22,181 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:40:22,181 MultiCorpus: 14465 train + 1392 dev + 2432 test sentences |
|
- NER_HIPE_2022 Corpus: 14465 train + 1392 dev + 2432 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/letemps/fr/with_doc_seperator |
|
2023-10-13 18:40:22,181 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:40:22,181 Train: 14465 sentences |
|
2023-10-13 18:40:22,181 (train_with_dev=False, train_with_test=False) |
|
2023-10-13 18:40:22,182 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:40:22,182 Training Params: |
|
2023-10-13 18:40:22,182 - learning_rate: "0.00016" |
|
2023-10-13 18:40:22,182 - mini_batch_size: "4" |
|
2023-10-13 18:40:22,182 - max_epochs: "10" |
|
2023-10-13 18:40:22,182 - shuffle: "True" |
|
2023-10-13 18:40:22,182 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:40:22,182 Plugins: |
|
2023-10-13 18:40:22,182 - TensorboardLogger |
|
2023-10-13 18:40:22,182 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-13 18:40:22,182 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:40:22,182 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-13 18:40:22,182 - metric: "('micro avg', 'f1-score')" |
|
2023-10-13 18:40:22,182 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:40:22,182 Computation: |
|
2023-10-13 18:40:22,183 - compute on device: cuda:0 |
|
2023-10-13 18:40:22,183 - embedding storage: none |
|
2023-10-13 18:40:22,183 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:40:22,183 Model training base path: "hmbench-letemps/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-2" |
|
2023-10-13 18:40:22,183 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:40:22,183 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:40:22,183 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-13 18:41:58,767 epoch 1 - iter 361/3617 - loss 2.52124674 - time (sec): 96.58 - samples/sec: 386.79 - lr: 0.000016 - momentum: 0.000000 |
|
2023-10-13 18:43:34,508 epoch 1 - iter 722/3617 - loss 2.11644111 - time (sec): 192.32 - samples/sec: 388.65 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-13 18:45:09,650 epoch 1 - iter 1083/3617 - loss 1.64036489 - time (sec): 287.46 - samples/sec: 393.68 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-13 18:46:45,788 epoch 1 - iter 1444/3617 - loss 1.30596637 - time (sec): 383.60 - samples/sec: 393.95 - lr: 0.000064 - momentum: 0.000000 |
|
2023-10-13 18:48:22,571 epoch 1 - iter 1805/3617 - loss 1.08312946 - time (sec): 480.39 - samples/sec: 393.97 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-13 18:49:59,710 epoch 1 - iter 2166/3617 - loss 0.93558591 - time (sec): 577.53 - samples/sec: 392.82 - lr: 0.000096 - momentum: 0.000000 |
|
2023-10-13 18:51:37,931 epoch 1 - iter 2527/3617 - loss 0.82548930 - time (sec): 675.75 - samples/sec: 392.25 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-13 18:53:15,367 epoch 1 - iter 2888/3617 - loss 0.73875966 - time (sec): 773.18 - samples/sec: 391.69 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-13 18:54:56,085 epoch 1 - iter 3249/3617 - loss 0.67039148 - time (sec): 873.90 - samples/sec: 391.02 - lr: 0.000144 - momentum: 0.000000 |
|
2023-10-13 18:56:40,552 epoch 1 - iter 3610/3617 - loss 0.61660340 - time (sec): 978.37 - samples/sec: 387.65 - lr: 0.000160 - momentum: 0.000000 |
|
2023-10-13 18:56:42,400 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:56:42,400 EPOCH 1 done: loss 0.6157 - lr: 0.000160 |
|
2023-10-13 18:57:20,914 DEV : loss 0.13260270655155182 - f1-score (micro avg) 0.5508 |
|
2023-10-13 18:57:20,972 saving best model |
|
2023-10-13 18:57:21,837 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:59:02,582 epoch 2 - iter 361/3617 - loss 0.11003363 - time (sec): 100.74 - samples/sec: 364.24 - lr: 0.000158 - momentum: 0.000000 |
|
2023-10-13 19:00:41,098 epoch 2 - iter 722/3617 - loss 0.10629388 - time (sec): 199.26 - samples/sec: 374.92 - lr: 0.000156 - momentum: 0.000000 |
|
2023-10-13 19:02:20,551 epoch 2 - iter 1083/3617 - loss 0.10240780 - time (sec): 298.71 - samples/sec: 378.36 - lr: 0.000155 - momentum: 0.000000 |
|
2023-10-13 19:03:57,204 epoch 2 - iter 1444/3617 - loss 0.10038512 - time (sec): 395.36 - samples/sec: 381.86 - lr: 0.000153 - momentum: 0.000000 |
|
2023-10-13 19:05:34,866 epoch 2 - iter 1805/3617 - loss 0.09845307 - time (sec): 493.03 - samples/sec: 385.47 - lr: 0.000151 - momentum: 0.000000 |
|
2023-10-13 19:07:10,754 epoch 2 - iter 2166/3617 - loss 0.09905672 - time (sec): 588.91 - samples/sec: 384.71 - lr: 0.000149 - momentum: 0.000000 |
|
2023-10-13 19:08:47,888 epoch 2 - iter 2527/3617 - loss 0.09733769 - time (sec): 686.05 - samples/sec: 385.00 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-13 19:10:26,147 epoch 2 - iter 2888/3617 - loss 0.09555977 - time (sec): 784.31 - samples/sec: 386.99 - lr: 0.000146 - momentum: 0.000000 |
|
2023-10-13 19:12:03,836 epoch 2 - iter 3249/3617 - loss 0.09404990 - time (sec): 882.00 - samples/sec: 387.21 - lr: 0.000144 - momentum: 0.000000 |
|
2023-10-13 19:13:40,645 epoch 2 - iter 3610/3617 - loss 0.09376699 - time (sec): 978.81 - samples/sec: 387.37 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-13 19:13:42,408 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 19:13:42,408 EPOCH 2 done: loss 0.0938 - lr: 0.000142 |
|
2023-10-13 19:14:21,070 DEV : loss 0.1215815320611 - f1-score (micro avg) 0.5926 |
|
2023-10-13 19:14:21,126 saving best model |
|
2023-10-13 19:14:23,664 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 19:16:01,864 epoch 3 - iter 361/3617 - loss 0.05640694 - time (sec): 98.20 - samples/sec: 399.92 - lr: 0.000140 - momentum: 0.000000 |
|
2023-10-13 19:17:37,015 epoch 3 - iter 722/3617 - loss 0.05950424 - time (sec): 193.35 - samples/sec: 392.78 - lr: 0.000139 - momentum: 0.000000 |
|
2023-10-13 19:19:13,647 epoch 3 - iter 1083/3617 - loss 0.06066695 - time (sec): 289.98 - samples/sec: 391.53 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-13 19:20:51,628 epoch 3 - iter 1444/3617 - loss 0.06135792 - time (sec): 387.96 - samples/sec: 389.69 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-13 19:22:29,614 epoch 3 - iter 1805/3617 - loss 0.06119400 - time (sec): 485.95 - samples/sec: 390.11 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-13 19:24:06,053 epoch 3 - iter 2166/3617 - loss 0.06300933 - time (sec): 582.39 - samples/sec: 388.28 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-13 19:25:44,104 epoch 3 - iter 2527/3617 - loss 0.06310662 - time (sec): 680.44 - samples/sec: 390.67 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-13 19:27:20,850 epoch 3 - iter 2888/3617 - loss 0.06456942 - time (sec): 777.18 - samples/sec: 388.75 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-13 19:29:01,123 epoch 3 - iter 3249/3617 - loss 0.06429657 - time (sec): 877.46 - samples/sec: 388.25 - lr: 0.000126 - momentum: 0.000000 |
|
2023-10-13 19:30:43,178 epoch 3 - iter 3610/3617 - loss 0.06441539 - time (sec): 979.51 - samples/sec: 387.28 - lr: 0.000124 - momentum: 0.000000 |
|
2023-10-13 19:30:44,906 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 19:30:44,907 EPOCH 3 done: loss 0.0644 - lr: 0.000124 |
|
2023-10-13 19:31:24,062 DEV : loss 0.1586698442697525 - f1-score (micro avg) 0.6321 |
|
2023-10-13 19:31:24,118 saving best model |
|
2023-10-13 19:31:26,657 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 19:33:04,849 epoch 4 - iter 361/3617 - loss 0.04452930 - time (sec): 98.19 - samples/sec: 378.11 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-13 19:34:44,327 epoch 4 - iter 722/3617 - loss 0.04066016 - time (sec): 197.67 - samples/sec: 385.17 - lr: 0.000121 - momentum: 0.000000 |
|
2023-10-13 19:36:23,824 epoch 4 - iter 1083/3617 - loss 0.04578705 - time (sec): 297.16 - samples/sec: 382.49 - lr: 0.000119 - momentum: 0.000000 |
|
2023-10-13 19:38:02,132 epoch 4 - iter 1444/3617 - loss 0.04562518 - time (sec): 395.47 - samples/sec: 381.71 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-13 19:39:39,551 epoch 4 - iter 1805/3617 - loss 0.04646266 - time (sec): 492.89 - samples/sec: 383.13 - lr: 0.000116 - momentum: 0.000000 |
|
2023-10-13 19:41:15,820 epoch 4 - iter 2166/3617 - loss 0.04519717 - time (sec): 589.16 - samples/sec: 384.26 - lr: 0.000114 - momentum: 0.000000 |
|
2023-10-13 19:42:55,082 epoch 4 - iter 2527/3617 - loss 0.04533348 - time (sec): 688.42 - samples/sec: 383.51 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-13 19:44:36,264 epoch 4 - iter 2888/3617 - loss 0.04473135 - time (sec): 789.60 - samples/sec: 382.69 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-13 19:46:16,493 epoch 4 - iter 3249/3617 - loss 0.04510623 - time (sec): 889.83 - samples/sec: 383.59 - lr: 0.000108 - momentum: 0.000000 |
|
2023-10-13 19:47:53,562 epoch 4 - iter 3610/3617 - loss 0.04637355 - time (sec): 986.90 - samples/sec: 384.36 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-13 19:47:55,182 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 19:47:55,182 EPOCH 4 done: loss 0.0464 - lr: 0.000107 |
|
2023-10-13 19:48:34,460 DEV : loss 0.2130165547132492 - f1-score (micro avg) 0.6471 |
|
2023-10-13 19:48:34,518 saving best model |
|
2023-10-13 19:48:37,082 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 19:50:14,518 epoch 5 - iter 361/3617 - loss 0.02758511 - time (sec): 97.43 - samples/sec: 397.38 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-13 19:51:50,077 epoch 5 - iter 722/3617 - loss 0.02910352 - time (sec): 192.99 - samples/sec: 401.98 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-13 19:53:24,280 epoch 5 - iter 1083/3617 - loss 0.02918864 - time (sec): 287.19 - samples/sec: 398.11 - lr: 0.000101 - momentum: 0.000000 |
|
2023-10-13 19:55:00,737 epoch 5 - iter 1444/3617 - loss 0.03179865 - time (sec): 383.65 - samples/sec: 400.35 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-13 19:56:40,217 epoch 5 - iter 1805/3617 - loss 0.03047882 - time (sec): 483.13 - samples/sec: 397.62 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-13 19:58:17,353 epoch 5 - iter 2166/3617 - loss 0.03136089 - time (sec): 580.27 - samples/sec: 393.32 - lr: 0.000096 - momentum: 0.000000 |
|
2023-10-13 19:59:58,091 epoch 5 - iter 2527/3617 - loss 0.03107949 - time (sec): 681.01 - samples/sec: 391.65 - lr: 0.000094 - momentum: 0.000000 |
|
2023-10-13 20:01:34,403 epoch 5 - iter 2888/3617 - loss 0.03125495 - time (sec): 777.32 - samples/sec: 392.23 - lr: 0.000092 - momentum: 0.000000 |
|
2023-10-13 20:03:15,510 epoch 5 - iter 3249/3617 - loss 0.03139656 - time (sec): 878.42 - samples/sec: 388.36 - lr: 0.000091 - momentum: 0.000000 |
|
2023-10-13 20:04:55,226 epoch 5 - iter 3610/3617 - loss 0.03151360 - time (sec): 978.14 - samples/sec: 387.71 - lr: 0.000089 - momentum: 0.000000 |
|
2023-10-13 20:04:56,955 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 20:04:56,955 EPOCH 5 done: loss 0.0316 - lr: 0.000089 |
|
2023-10-13 20:05:37,673 DEV : loss 0.2452983260154724 - f1-score (micro avg) 0.6203 |
|
2023-10-13 20:05:37,733 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 20:07:19,848 epoch 6 - iter 361/3617 - loss 0.01651558 - time (sec): 102.11 - samples/sec: 372.95 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-13 20:08:58,158 epoch 6 - iter 722/3617 - loss 0.01869225 - time (sec): 200.42 - samples/sec: 375.10 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-13 20:10:36,227 epoch 6 - iter 1083/3617 - loss 0.01996453 - time (sec): 298.49 - samples/sec: 375.91 - lr: 0.000084 - momentum: 0.000000 |
|
2023-10-13 20:12:14,888 epoch 6 - iter 1444/3617 - loss 0.02211179 - time (sec): 397.15 - samples/sec: 379.66 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-13 20:13:51,084 epoch 6 - iter 1805/3617 - loss 0.02255716 - time (sec): 493.35 - samples/sec: 381.30 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-13 20:15:26,711 epoch 6 - iter 2166/3617 - loss 0.02243773 - time (sec): 588.98 - samples/sec: 383.43 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-13 20:17:02,468 epoch 6 - iter 2527/3617 - loss 0.02283358 - time (sec): 684.73 - samples/sec: 385.02 - lr: 0.000076 - momentum: 0.000000 |
|
2023-10-13 20:18:38,763 epoch 6 - iter 2888/3617 - loss 0.02312108 - time (sec): 781.03 - samples/sec: 387.56 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-13 20:20:14,206 epoch 6 - iter 3249/3617 - loss 0.02327124 - time (sec): 876.47 - samples/sec: 388.97 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-13 20:21:49,847 epoch 6 - iter 3610/3617 - loss 0.02419111 - time (sec): 972.11 - samples/sec: 390.08 - lr: 0.000071 - momentum: 0.000000 |
|
2023-10-13 20:21:51,535 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 20:21:51,535 EPOCH 6 done: loss 0.0242 - lr: 0.000071 |
|
2023-10-13 20:22:30,924 DEV : loss 0.27563825249671936 - f1-score (micro avg) 0.6231 |
|
2023-10-13 20:22:30,982 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 20:24:08,378 epoch 7 - iter 361/3617 - loss 0.01521082 - time (sec): 97.39 - samples/sec: 396.60 - lr: 0.000069 - momentum: 0.000000 |
|
2023-10-13 20:25:46,638 epoch 7 - iter 722/3617 - loss 0.01323902 - time (sec): 195.65 - samples/sec: 389.93 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-13 20:27:24,850 epoch 7 - iter 1083/3617 - loss 0.01408018 - time (sec): 293.87 - samples/sec: 392.34 - lr: 0.000066 - momentum: 0.000000 |
|
2023-10-13 20:29:00,668 epoch 7 - iter 1444/3617 - loss 0.01314577 - time (sec): 389.68 - samples/sec: 389.85 - lr: 0.000064 - momentum: 0.000000 |
|
2023-10-13 20:30:36,682 epoch 7 - iter 1805/3617 - loss 0.01419142 - time (sec): 485.70 - samples/sec: 391.03 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-13 20:32:12,581 epoch 7 - iter 2166/3617 - loss 0.01554244 - time (sec): 581.60 - samples/sec: 394.58 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-13 20:33:47,873 epoch 7 - iter 2527/3617 - loss 0.01578059 - time (sec): 676.89 - samples/sec: 394.74 - lr: 0.000059 - momentum: 0.000000 |
|
2023-10-13 20:35:23,614 epoch 7 - iter 2888/3617 - loss 0.01553333 - time (sec): 772.63 - samples/sec: 393.38 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-13 20:36:59,404 epoch 7 - iter 3249/3617 - loss 0.01558103 - time (sec): 868.42 - samples/sec: 392.48 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-13 20:38:37,239 epoch 7 - iter 3610/3617 - loss 0.01537530 - time (sec): 966.25 - samples/sec: 392.34 - lr: 0.000053 - momentum: 0.000000 |
|
2023-10-13 20:38:39,132 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 20:38:39,133 EPOCH 7 done: loss 0.0153 - lr: 0.000053 |
|
2023-10-13 20:39:17,634 DEV : loss 0.33923789858818054 - f1-score (micro avg) 0.6456 |
|
2023-10-13 20:39:17,691 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 20:40:58,364 epoch 8 - iter 361/3617 - loss 0.01277912 - time (sec): 100.67 - samples/sec: 378.20 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-13 20:42:39,601 epoch 8 - iter 722/3617 - loss 0.01152385 - time (sec): 201.91 - samples/sec: 384.45 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-13 20:44:18,870 epoch 8 - iter 1083/3617 - loss 0.01084500 - time (sec): 301.18 - samples/sec: 385.62 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-13 20:45:57,550 epoch 8 - iter 1444/3617 - loss 0.00969400 - time (sec): 399.86 - samples/sec: 386.70 - lr: 0.000046 - momentum: 0.000000 |
|
2023-10-13 20:47:33,176 epoch 8 - iter 1805/3617 - loss 0.00998289 - time (sec): 495.48 - samples/sec: 385.30 - lr: 0.000044 - momentum: 0.000000 |
|
2023-10-13 20:49:09,788 epoch 8 - iter 2166/3617 - loss 0.01065515 - time (sec): 592.09 - samples/sec: 388.20 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-13 20:50:44,987 epoch 8 - iter 2527/3617 - loss 0.01060654 - time (sec): 687.29 - samples/sec: 387.95 - lr: 0.000041 - momentum: 0.000000 |
|
2023-10-13 20:52:20,864 epoch 8 - iter 2888/3617 - loss 0.01058643 - time (sec): 783.17 - samples/sec: 388.32 - lr: 0.000039 - momentum: 0.000000 |
|
2023-10-13 20:53:59,752 epoch 8 - iter 3249/3617 - loss 0.01058392 - time (sec): 882.06 - samples/sec: 388.07 - lr: 0.000037 - momentum: 0.000000 |
|
2023-10-13 20:55:42,153 epoch 8 - iter 3610/3617 - loss 0.01028093 - time (sec): 984.46 - samples/sec: 385.49 - lr: 0.000036 - momentum: 0.000000 |
|
2023-10-13 20:55:43,713 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 20:55:43,713 EPOCH 8 done: loss 0.0103 - lr: 0.000036 |
|
2023-10-13 20:56:23,186 DEV : loss 0.33330851793289185 - f1-score (micro avg) 0.6595 |
|
2023-10-13 20:56:23,252 saving best model |
|
2023-10-13 20:56:25,832 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 20:58:02,028 epoch 9 - iter 361/3617 - loss 0.00470043 - time (sec): 96.19 - samples/sec: 378.73 - lr: 0.000034 - momentum: 0.000000 |
|
2023-10-13 20:59:40,260 epoch 9 - iter 722/3617 - loss 0.00531921 - time (sec): 194.42 - samples/sec: 384.79 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-13 21:01:18,063 epoch 9 - iter 1083/3617 - loss 0.00654361 - time (sec): 292.22 - samples/sec: 387.48 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-13 21:02:57,618 epoch 9 - iter 1444/3617 - loss 0.00676990 - time (sec): 391.78 - samples/sec: 385.89 - lr: 0.000028 - momentum: 0.000000 |
|
2023-10-13 21:04:40,159 epoch 9 - iter 1805/3617 - loss 0.00686632 - time (sec): 494.32 - samples/sec: 383.67 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-13 21:06:17,098 epoch 9 - iter 2166/3617 - loss 0.00685262 - time (sec): 591.26 - samples/sec: 385.86 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-13 21:07:52,623 epoch 9 - iter 2527/3617 - loss 0.00658488 - time (sec): 686.78 - samples/sec: 386.29 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-13 21:09:28,784 epoch 9 - iter 2888/3617 - loss 0.00668225 - time (sec): 782.95 - samples/sec: 384.96 - lr: 0.000021 - momentum: 0.000000 |
|
2023-10-13 21:11:07,550 epoch 9 - iter 3249/3617 - loss 0.00659644 - time (sec): 881.71 - samples/sec: 385.57 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-13 21:12:45,880 epoch 9 - iter 3610/3617 - loss 0.00676788 - time (sec): 980.04 - samples/sec: 386.92 - lr: 0.000018 - momentum: 0.000000 |
|
2023-10-13 21:12:47,696 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 21:12:47,696 EPOCH 9 done: loss 0.0068 - lr: 0.000018 |
|
2023-10-13 21:13:27,417 DEV : loss 0.3747117519378662 - f1-score (micro avg) 0.6531 |
|
2023-10-13 21:13:27,476 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 21:15:06,935 epoch 10 - iter 361/3617 - loss 0.00214926 - time (sec): 99.46 - samples/sec: 384.82 - lr: 0.000016 - momentum: 0.000000 |
|
2023-10-13 21:16:45,731 epoch 10 - iter 722/3617 - loss 0.00207497 - time (sec): 198.25 - samples/sec: 383.67 - lr: 0.000014 - momentum: 0.000000 |
|
2023-10-13 21:18:22,667 epoch 10 - iter 1083/3617 - loss 0.00303010 - time (sec): 295.19 - samples/sec: 384.39 - lr: 0.000012 - momentum: 0.000000 |
|
2023-10-13 21:20:05,149 epoch 10 - iter 1444/3617 - loss 0.00354649 - time (sec): 397.67 - samples/sec: 382.40 - lr: 0.000011 - momentum: 0.000000 |
|
2023-10-13 21:21:42,479 epoch 10 - iter 1805/3617 - loss 0.00366430 - time (sec): 495.00 - samples/sec: 382.22 - lr: 0.000009 - momentum: 0.000000 |
|
2023-10-13 21:23:21,257 epoch 10 - iter 2166/3617 - loss 0.00439229 - time (sec): 593.78 - samples/sec: 382.12 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-13 21:25:03,186 epoch 10 - iter 2527/3617 - loss 0.00430784 - time (sec): 695.71 - samples/sec: 381.88 - lr: 0.000005 - momentum: 0.000000 |
|
2023-10-13 21:26:45,484 epoch 10 - iter 2888/3617 - loss 0.00395290 - time (sec): 798.01 - samples/sec: 381.84 - lr: 0.000004 - momentum: 0.000000 |
|
2023-10-13 21:28:24,586 epoch 10 - iter 3249/3617 - loss 0.00388602 - time (sec): 897.11 - samples/sec: 379.67 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-13 21:30:08,585 epoch 10 - iter 3610/3617 - loss 0.00392291 - time (sec): 1001.11 - samples/sec: 378.90 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-13 21:30:10,346 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 21:30:10,347 EPOCH 10 done: loss 0.0039 - lr: 0.000000 |
|
2023-10-13 21:30:52,851 DEV : loss 0.3852955400943756 - f1-score (micro avg) 0.6562 |
|
2023-10-13 21:30:53,782 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 21:30:53,784 Loading model from best epoch ... |
|
2023-10-13 21:30:57,709 SequenceTagger predicts: Dictionary with 13 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org |
|
2023-10-13 21:31:56,313 |
|
Results: |
|
- F-score (micro) 0.6292 |
|
- F-score (macro) 0.4868 |
|
- Accuracy 0.4702 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
loc 0.6219 0.7597 0.6839 591 |
|
pers 0.5708 0.7003 0.6289 357 |
|
org 0.1571 0.1392 0.1477 79 |
|
|
|
micro avg 0.5772 0.6913 0.6292 1027 |
|
macro avg 0.4499 0.5331 0.4868 1027 |
|
weighted avg 0.5684 0.6913 0.6236 1027 |
|
|
|
2023-10-13 21:31:56,313 ---------------------------------------------------------------------------------------------------- |
|
|