2023-10-15 15:09:10,345 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:09:10,346 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(32001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-15 15:09:10,347 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:09:10,347 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator 2023-10-15 15:09:10,347 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:09:10,347 Train: 20847 sentences 2023-10-15 15:09:10,347 (train_with_dev=False, train_with_test=False) 2023-10-15 15:09:10,347 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:09:10,347 Training Params: 2023-10-15 15:09:10,347 - learning_rate: "3e-05" 2023-10-15 15:09:10,347 - mini_batch_size: "4" 2023-10-15 15:09:10,347 - max_epochs: "10" 2023-10-15 15:09:10,347 - shuffle: "True" 2023-10-15 15:09:10,347 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:09:10,347 Plugins: 2023-10-15 15:09:10,347 - LinearScheduler | warmup_fraction: '0.1' 2023-10-15 15:09:10,347 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:09:10,347 Final evaluation on model from best epoch (best-model.pt) 2023-10-15 15:09:10,347 - metric: "('micro avg', 'f1-score')" 2023-10-15 15:09:10,347 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:09:10,347 Computation: 2023-10-15 15:09:10,347 - compute on device: cuda:0 2023-10-15 15:09:10,347 - embedding storage: none 2023-10-15 15:09:10,347 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:09:10,347 Model training base path: "hmbench-newseye/de-dbmdz/bert-base-historic-multilingual-cased-bs4-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-2" 2023-10-15 15:09:10,347 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:09:10,347 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:09:36,345 epoch 1 - iter 521/5212 - loss 1.70001218 - time (sec): 26.00 - samples/sec: 1455.46 - lr: 0.000003 - momentum: 0.000000 2023-10-15 15:10:02,999 epoch 1 - iter 1042/5212 - loss 1.06814573 - time (sec): 52.65 - samples/sec: 1418.65 - lr: 0.000006 - momentum: 0.000000 2023-10-15 15:10:29,184 epoch 1 - iter 1563/5212 - loss 0.80933221 - time (sec): 78.84 - samples/sec: 1453.59 - lr: 0.000009 - momentum: 0.000000 2023-10-15 15:10:54,029 epoch 1 - iter 2084/5212 - loss 0.69069653 - time (sec): 103.68 - samples/sec: 1449.58 - lr: 0.000012 - momentum: 0.000000 2023-10-15 15:11:18,988 epoch 1 - iter 2605/5212 - loss 0.60440945 - time (sec): 128.64 - samples/sec: 1454.63 - lr: 0.000015 - momentum: 0.000000 2023-10-15 15:11:43,958 epoch 1 - iter 3126/5212 - loss 0.54666260 - time (sec): 153.61 - samples/sec: 1454.40 - lr: 0.000018 - momentum: 0.000000 2023-10-15 15:12:09,304 epoch 1 - iter 3647/5212 - loss 0.50222284 - time (sec): 178.96 - samples/sec: 1447.75 - lr: 0.000021 - momentum: 0.000000 2023-10-15 15:12:34,296 epoch 1 - iter 4168/5212 - loss 0.47277877 - time (sec): 203.95 - samples/sec: 1441.16 - lr: 0.000024 - momentum: 0.000000 2023-10-15 15:12:59,391 epoch 1 - iter 4689/5212 - loss 0.44260534 - time (sec): 229.04 - samples/sec: 1444.12 - lr: 0.000027 - momentum: 0.000000 2023-10-15 15:13:25,052 epoch 1 - iter 5210/5212 - loss 0.41994283 - time (sec): 254.70 - samples/sec: 1442.37 - lr: 0.000030 - momentum: 0.000000 2023-10-15 15:13:25,147 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:13:25,148 EPOCH 1 done: loss 0.4199 - lr: 0.000030 2023-10-15 15:13:30,935 DEV : loss 0.2363644540309906 - f1-score (micro avg) 0.2843 2023-10-15 15:13:30,961 saving best model 2023-10-15 15:13:31,321 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:13:56,373 epoch 2 - iter 521/5212 - loss 0.17672925 - time (sec): 25.05 - samples/sec: 1403.82 - lr: 0.000030 - momentum: 0.000000 2023-10-15 15:14:21,203 epoch 2 - iter 1042/5212 - loss 0.16983337 - time (sec): 49.88 - samples/sec: 1429.70 - lr: 0.000029 - momentum: 0.000000 2023-10-15 15:14:46,722 epoch 2 - iter 1563/5212 - loss 0.18368285 - time (sec): 75.40 - samples/sec: 1445.90 - lr: 0.000029 - momentum: 0.000000 2023-10-15 15:15:11,777 epoch 2 - iter 2084/5212 - loss 0.17905217 - time (sec): 100.45 - samples/sec: 1443.45 - lr: 0.000029 - momentum: 0.000000 2023-10-15 15:15:36,664 epoch 2 - iter 2605/5212 - loss 0.17232351 - time (sec): 125.34 - samples/sec: 1455.15 - lr: 0.000028 - momentum: 0.000000 2023-10-15 15:16:02,030 epoch 2 - iter 3126/5212 - loss 0.17048500 - time (sec): 150.71 - samples/sec: 1460.32 - lr: 0.000028 - momentum: 0.000000 2023-10-15 15:16:26,925 epoch 2 - iter 3647/5212 - loss 0.17267205 - time (sec): 175.60 - samples/sec: 1461.13 - lr: 0.000028 - momentum: 0.000000 2023-10-15 15:16:52,295 epoch 2 - iter 4168/5212 - loss 0.17077349 - time (sec): 200.97 - samples/sec: 1466.42 - lr: 0.000027 - momentum: 0.000000 2023-10-15 15:17:17,453 epoch 2 - iter 4689/5212 - loss 0.17105767 - time (sec): 226.13 - samples/sec: 1464.23 - lr: 0.000027 - momentum: 0.000000 2023-10-15 15:17:42,380 epoch 2 - iter 5210/5212 - loss 0.16953223 - time (sec): 251.06 - samples/sec: 1462.79 - lr: 0.000027 - momentum: 0.000000 2023-10-15 15:17:42,472 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:17:42,472 EPOCH 2 done: loss 0.1695 - lr: 0.000027 2023-10-15 15:17:51,427 DEV : loss 0.16215363144874573 - f1-score (micro avg) 0.3835 2023-10-15 15:17:51,453 saving best model 2023-10-15 15:17:51,895 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:18:17,980 epoch 3 - iter 521/5212 - loss 0.10469666 - time (sec): 26.08 - samples/sec: 1494.63 - lr: 0.000026 - momentum: 0.000000 2023-10-15 15:18:43,907 epoch 3 - iter 1042/5212 - loss 0.11123288 - time (sec): 52.01 - samples/sec: 1491.14 - lr: 0.000026 - momentum: 0.000000 2023-10-15 15:19:08,918 epoch 3 - iter 1563/5212 - loss 0.11762357 - time (sec): 77.02 - samples/sec: 1491.41 - lr: 0.000026 - momentum: 0.000000 2023-10-15 15:19:33,852 epoch 3 - iter 2084/5212 - loss 0.12270683 - time (sec): 101.95 - samples/sec: 1475.35 - lr: 0.000025 - momentum: 0.000000 2023-10-15 15:19:59,052 epoch 3 - iter 2605/5212 - loss 0.12115573 - time (sec): 127.16 - samples/sec: 1483.64 - lr: 0.000025 - momentum: 0.000000 2023-10-15 15:20:23,967 epoch 3 - iter 3126/5212 - loss 0.12016284 - time (sec): 152.07 - samples/sec: 1467.39 - lr: 0.000025 - momentum: 0.000000 2023-10-15 15:20:49,519 epoch 3 - iter 3647/5212 - loss 0.12039286 - time (sec): 177.62 - samples/sec: 1453.89 - lr: 0.000024 - momentum: 0.000000 2023-10-15 15:21:13,549 epoch 3 - iter 4168/5212 - loss 0.11951394 - time (sec): 201.65 - samples/sec: 1453.79 - lr: 0.000024 - momentum: 0.000000 2023-10-15 15:21:37,889 epoch 3 - iter 4689/5212 - loss 0.11769963 - time (sec): 225.99 - samples/sec: 1464.75 - lr: 0.000024 - momentum: 0.000000 2023-10-15 15:22:01,759 epoch 3 - iter 5210/5212 - loss 0.11590928 - time (sec): 249.86 - samples/sec: 1470.09 - lr: 0.000023 - momentum: 0.000000 2023-10-15 15:22:01,847 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:22:01,847 EPOCH 3 done: loss 0.1159 - lr: 0.000023 2023-10-15 15:22:10,885 DEV : loss 0.3360123932361603 - f1-score (micro avg) 0.3377 2023-10-15 15:22:10,912 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:22:36,517 epoch 4 - iter 521/5212 - loss 0.09249251 - time (sec): 25.60 - samples/sec: 1518.66 - lr: 0.000023 - momentum: 0.000000 2023-10-15 15:23:01,241 epoch 4 - iter 1042/5212 - loss 0.09120710 - time (sec): 50.33 - samples/sec: 1442.37 - lr: 0.000023 - momentum: 0.000000 2023-10-15 15:23:26,282 epoch 4 - iter 1563/5212 - loss 0.08893166 - time (sec): 75.37 - samples/sec: 1456.98 - lr: 0.000022 - momentum: 0.000000 2023-10-15 15:23:50,937 epoch 4 - iter 2084/5212 - loss 0.08371995 - time (sec): 100.02 - samples/sec: 1452.06 - lr: 0.000022 - momentum: 0.000000 2023-10-15 15:24:15,942 epoch 4 - iter 2605/5212 - loss 0.08118124 - time (sec): 125.03 - samples/sec: 1456.63 - lr: 0.000022 - momentum: 0.000000 2023-10-15 15:24:41,451 epoch 4 - iter 3126/5212 - loss 0.08066818 - time (sec): 150.54 - samples/sec: 1459.68 - lr: 0.000021 - momentum: 0.000000 2023-10-15 15:25:06,504 epoch 4 - iter 3647/5212 - loss 0.08362064 - time (sec): 175.59 - samples/sec: 1457.66 - lr: 0.000021 - momentum: 0.000000 2023-10-15 15:25:31,596 epoch 4 - iter 4168/5212 - loss 0.08316214 - time (sec): 200.68 - samples/sec: 1455.02 - lr: 0.000021 - momentum: 0.000000 2023-10-15 15:25:57,067 epoch 4 - iter 4689/5212 - loss 0.08345938 - time (sec): 226.15 - samples/sec: 1457.65 - lr: 0.000020 - momentum: 0.000000 2023-10-15 15:26:22,539 epoch 4 - iter 5210/5212 - loss 0.08274423 - time (sec): 251.63 - samples/sec: 1459.97 - lr: 0.000020 - momentum: 0.000000 2023-10-15 15:26:22,628 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:26:22,628 EPOCH 4 done: loss 0.0828 - lr: 0.000020 2023-10-15 15:26:31,008 DEV : loss 0.28332096338272095 - f1-score (micro avg) 0.3231 2023-10-15 15:26:31,039 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:26:57,145 epoch 5 - iter 521/5212 - loss 0.05268117 - time (sec): 26.11 - samples/sec: 1408.81 - lr: 0.000020 - momentum: 0.000000 2023-10-15 15:27:23,828 epoch 5 - iter 1042/5212 - loss 0.05092813 - time (sec): 52.79 - samples/sec: 1481.48 - lr: 0.000019 - momentum: 0.000000 2023-10-15 15:27:49,128 epoch 5 - iter 1563/5212 - loss 0.05327142 - time (sec): 78.09 - samples/sec: 1456.97 - lr: 0.000019 - momentum: 0.000000 2023-10-15 15:28:15,064 epoch 5 - iter 2084/5212 - loss 0.05794021 - time (sec): 104.02 - samples/sec: 1466.54 - lr: 0.000019 - momentum: 0.000000 2023-10-15 15:28:40,167 epoch 5 - iter 2605/5212 - loss 0.05639404 - time (sec): 129.13 - samples/sec: 1456.04 - lr: 0.000018 - momentum: 0.000000 2023-10-15 15:29:05,571 epoch 5 - iter 3126/5212 - loss 0.05672885 - time (sec): 154.53 - samples/sec: 1464.34 - lr: 0.000018 - momentum: 0.000000 2023-10-15 15:29:30,847 epoch 5 - iter 3647/5212 - loss 0.05616822 - time (sec): 179.81 - samples/sec: 1451.42 - lr: 0.000018 - momentum: 0.000000 2023-10-15 15:29:55,838 epoch 5 - iter 4168/5212 - loss 0.05709755 - time (sec): 204.80 - samples/sec: 1446.81 - lr: 0.000017 - momentum: 0.000000 2023-10-15 15:30:21,191 epoch 5 - iter 4689/5212 - loss 0.05865289 - time (sec): 230.15 - samples/sec: 1451.81 - lr: 0.000017 - momentum: 0.000000 2023-10-15 15:30:45,876 epoch 5 - iter 5210/5212 - loss 0.05877178 - time (sec): 254.84 - samples/sec: 1441.30 - lr: 0.000017 - momentum: 0.000000 2023-10-15 15:30:45,974 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:30:45,974 EPOCH 5 done: loss 0.0588 - lr: 0.000017 2023-10-15 15:30:54,204 DEV : loss 0.3178791105747223 - f1-score (micro avg) 0.3851 2023-10-15 15:30:54,233 saving best model 2023-10-15 15:30:54,606 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:31:20,370 epoch 6 - iter 521/5212 - loss 0.03111639 - time (sec): 25.76 - samples/sec: 1497.34 - lr: 0.000016 - momentum: 0.000000 2023-10-15 15:31:45,536 epoch 6 - iter 1042/5212 - loss 0.04174411 - time (sec): 50.93 - samples/sec: 1462.79 - lr: 0.000016 - momentum: 0.000000 2023-10-15 15:32:10,834 epoch 6 - iter 1563/5212 - loss 0.03928689 - time (sec): 76.23 - samples/sec: 1467.34 - lr: 0.000016 - momentum: 0.000000 2023-10-15 15:32:36,518 epoch 6 - iter 2084/5212 - loss 0.04056787 - time (sec): 101.91 - samples/sec: 1437.64 - lr: 0.000015 - momentum: 0.000000 2023-10-15 15:33:01,811 epoch 6 - iter 2605/5212 - loss 0.03954421 - time (sec): 127.20 - samples/sec: 1446.65 - lr: 0.000015 - momentum: 0.000000 2023-10-15 15:33:27,381 epoch 6 - iter 3126/5212 - loss 0.04043613 - time (sec): 152.77 - samples/sec: 1450.98 - lr: 0.000015 - momentum: 0.000000 2023-10-15 15:33:53,366 epoch 6 - iter 3647/5212 - loss 0.04034187 - time (sec): 178.76 - samples/sec: 1465.42 - lr: 0.000014 - momentum: 0.000000 2023-10-15 15:34:18,104 epoch 6 - iter 4168/5212 - loss 0.04068735 - time (sec): 203.50 - samples/sec: 1453.71 - lr: 0.000014 - momentum: 0.000000 2023-10-15 15:34:42,838 epoch 6 - iter 4689/5212 - loss 0.04106378 - time (sec): 228.23 - samples/sec: 1454.69 - lr: 0.000014 - momentum: 0.000000 2023-10-15 15:35:07,496 epoch 6 - iter 5210/5212 - loss 0.04133139 - time (sec): 252.89 - samples/sec: 1452.72 - lr: 0.000013 - momentum: 0.000000 2023-10-15 15:35:07,587 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:35:07,587 EPOCH 6 done: loss 0.0413 - lr: 0.000013 2023-10-15 15:35:16,109 DEV : loss 0.298368901014328 - f1-score (micro avg) 0.4014 2023-10-15 15:35:16,143 saving best model 2023-10-15 15:35:16,629 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:35:41,719 epoch 7 - iter 521/5212 - loss 0.02768974 - time (sec): 25.09 - samples/sec: 1345.24 - lr: 0.000013 - momentum: 0.000000 2023-10-15 15:36:06,868 epoch 7 - iter 1042/5212 - loss 0.03254404 - time (sec): 50.24 - samples/sec: 1382.74 - lr: 0.000013 - momentum: 0.000000 2023-10-15 15:36:33,160 epoch 7 - iter 1563/5212 - loss 0.03323169 - time (sec): 76.53 - samples/sec: 1414.01 - lr: 0.000012 - momentum: 0.000000 2023-10-15 15:36:58,282 epoch 7 - iter 2084/5212 - loss 0.03438222 - time (sec): 101.65 - samples/sec: 1407.75 - lr: 0.000012 - momentum: 0.000000 2023-10-15 15:37:23,204 epoch 7 - iter 2605/5212 - loss 0.03394163 - time (sec): 126.57 - samples/sec: 1399.38 - lr: 0.000012 - momentum: 0.000000 2023-10-15 15:37:48,331 epoch 7 - iter 3126/5212 - loss 0.03392244 - time (sec): 151.70 - samples/sec: 1411.36 - lr: 0.000011 - momentum: 0.000000 2023-10-15 15:38:14,730 epoch 7 - iter 3647/5212 - loss 0.03283761 - time (sec): 178.10 - samples/sec: 1410.94 - lr: 0.000011 - momentum: 0.000000 2023-10-15 15:38:41,408 epoch 7 - iter 4168/5212 - loss 0.03197315 - time (sec): 204.78 - samples/sec: 1418.09 - lr: 0.000011 - momentum: 0.000000 2023-10-15 15:39:07,470 epoch 7 - iter 4689/5212 - loss 0.03103844 - time (sec): 230.84 - samples/sec: 1426.21 - lr: 0.000010 - momentum: 0.000000 2023-10-15 15:39:34,028 epoch 7 - iter 5210/5212 - loss 0.03110437 - time (sec): 257.40 - samples/sec: 1426.17 - lr: 0.000010 - momentum: 0.000000 2023-10-15 15:39:34,151 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:39:34,152 EPOCH 7 done: loss 0.0311 - lr: 0.000010 2023-10-15 15:39:42,426 DEV : loss 0.43352431058883667 - f1-score (micro avg) 0.3734 2023-10-15 15:39:42,457 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:40:08,079 epoch 8 - iter 521/5212 - loss 0.01599032 - time (sec): 25.62 - samples/sec: 1493.11 - lr: 0.000010 - momentum: 0.000000 2023-10-15 15:40:33,449 epoch 8 - iter 1042/5212 - loss 0.01861222 - time (sec): 50.99 - samples/sec: 1472.02 - lr: 0.000009 - momentum: 0.000000 2023-10-15 15:40:58,531 epoch 8 - iter 1563/5212 - loss 0.02102450 - time (sec): 76.07 - samples/sec: 1430.47 - lr: 0.000009 - momentum: 0.000000 2023-10-15 15:41:23,941 epoch 8 - iter 2084/5212 - loss 0.02070202 - time (sec): 101.48 - samples/sec: 1429.28 - lr: 0.000009 - momentum: 0.000000 2023-10-15 15:41:49,399 epoch 8 - iter 2605/5212 - loss 0.02112981 - time (sec): 126.94 - samples/sec: 1410.72 - lr: 0.000008 - momentum: 0.000000 2023-10-15 15:42:14,841 epoch 8 - iter 3126/5212 - loss 0.02245286 - time (sec): 152.38 - samples/sec: 1417.79 - lr: 0.000008 - momentum: 0.000000 2023-10-15 15:42:40,962 epoch 8 - iter 3647/5212 - loss 0.02181557 - time (sec): 178.50 - samples/sec: 1437.19 - lr: 0.000008 - momentum: 0.000000 2023-10-15 15:43:06,269 epoch 8 - iter 4168/5212 - loss 0.02303020 - time (sec): 203.81 - samples/sec: 1445.16 - lr: 0.000007 - momentum: 0.000000 2023-10-15 15:43:31,410 epoch 8 - iter 4689/5212 - loss 0.02258626 - time (sec): 228.95 - samples/sec: 1441.42 - lr: 0.000007 - momentum: 0.000000 2023-10-15 15:43:57,192 epoch 8 - iter 5210/5212 - loss 0.02227638 - time (sec): 254.73 - samples/sec: 1441.65 - lr: 0.000007 - momentum: 0.000000 2023-10-15 15:43:57,287 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:43:57,287 EPOCH 8 done: loss 0.0223 - lr: 0.000007 2023-10-15 15:44:06,531 DEV : loss 0.450359582901001 - f1-score (micro avg) 0.3772 2023-10-15 15:44:06,563 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:44:31,209 epoch 9 - iter 521/5212 - loss 0.01562105 - time (sec): 24.64 - samples/sec: 1407.65 - lr: 0.000006 - momentum: 0.000000 2023-10-15 15:44:55,932 epoch 9 - iter 1042/5212 - loss 0.01453439 - time (sec): 49.37 - samples/sec: 1388.63 - lr: 0.000006 - momentum: 0.000000 2023-10-15 15:45:21,936 epoch 9 - iter 1563/5212 - loss 0.01536925 - time (sec): 75.37 - samples/sec: 1437.22 - lr: 0.000006 - momentum: 0.000000 2023-10-15 15:45:47,163 epoch 9 - iter 2084/5212 - loss 0.01490582 - time (sec): 100.60 - samples/sec: 1446.45 - lr: 0.000005 - momentum: 0.000000 2023-10-15 15:46:12,767 epoch 9 - iter 2605/5212 - loss 0.01487153 - time (sec): 126.20 - samples/sec: 1454.60 - lr: 0.000005 - momentum: 0.000000 2023-10-15 15:46:38,173 epoch 9 - iter 3126/5212 - loss 0.01533863 - time (sec): 151.61 - samples/sec: 1451.73 - lr: 0.000005 - momentum: 0.000000 2023-10-15 15:47:03,532 epoch 9 - iter 3647/5212 - loss 0.01597661 - time (sec): 176.97 - samples/sec: 1455.60 - lr: 0.000004 - momentum: 0.000000 2023-10-15 15:47:28,771 epoch 9 - iter 4168/5212 - loss 0.01580615 - time (sec): 202.21 - samples/sec: 1456.29 - lr: 0.000004 - momentum: 0.000000 2023-10-15 15:47:54,354 epoch 9 - iter 4689/5212 - loss 0.01547589 - time (sec): 227.79 - samples/sec: 1454.52 - lr: 0.000004 - momentum: 0.000000 2023-10-15 15:48:19,499 epoch 9 - iter 5210/5212 - loss 0.01521545 - time (sec): 252.93 - samples/sec: 1452.41 - lr: 0.000003 - momentum: 0.000000 2023-10-15 15:48:19,584 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:48:19,584 EPOCH 9 done: loss 0.0152 - lr: 0.000003 2023-10-15 15:48:28,971 DEV : loss 0.4913356900215149 - f1-score (micro avg) 0.369 2023-10-15 15:48:29,002 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:48:55,241 epoch 10 - iter 521/5212 - loss 0.00783394 - time (sec): 26.24 - samples/sec: 1417.76 - lr: 0.000003 - momentum: 0.000000 2023-10-15 15:49:20,769 epoch 10 - iter 1042/5212 - loss 0.01120969 - time (sec): 51.77 - samples/sec: 1427.38 - lr: 0.000003 - momentum: 0.000000 2023-10-15 15:49:46,662 epoch 10 - iter 1563/5212 - loss 0.00991988 - time (sec): 77.66 - samples/sec: 1460.85 - lr: 0.000002 - momentum: 0.000000 2023-10-15 15:50:11,819 epoch 10 - iter 2084/5212 - loss 0.00947631 - time (sec): 102.82 - samples/sec: 1461.11 - lr: 0.000002 - momentum: 0.000000 2023-10-15 15:50:36,250 epoch 10 - iter 2605/5212 - loss 0.01010491 - time (sec): 127.25 - samples/sec: 1450.05 - lr: 0.000002 - momentum: 0.000000 2023-10-15 15:51:01,852 epoch 10 - iter 3126/5212 - loss 0.01027554 - time (sec): 152.85 - samples/sec: 1444.89 - lr: 0.000001 - momentum: 0.000000 2023-10-15 15:51:27,567 epoch 10 - iter 3647/5212 - loss 0.01132153 - time (sec): 178.56 - samples/sec: 1448.36 - lr: 0.000001 - momentum: 0.000000 2023-10-15 15:51:52,655 epoch 10 - iter 4168/5212 - loss 0.01087304 - time (sec): 203.65 - samples/sec: 1443.38 - lr: 0.000001 - momentum: 0.000000 2023-10-15 15:52:17,989 epoch 10 - iter 4689/5212 - loss 0.01102544 - time (sec): 228.99 - samples/sec: 1443.50 - lr: 0.000000 - momentum: 0.000000 2023-10-15 15:52:42,744 epoch 10 - iter 5210/5212 - loss 0.01114682 - time (sec): 253.74 - samples/sec: 1447.96 - lr: 0.000000 - momentum: 0.000000 2023-10-15 15:52:42,831 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:52:42,832 EPOCH 10 done: loss 0.0111 - lr: 0.000000 2023-10-15 15:52:52,269 DEV : loss 0.47859835624694824 - f1-score (micro avg) 0.3772 2023-10-15 15:52:52,722 ---------------------------------------------------------------------------------------------------- 2023-10-15 15:52:52,724 Loading model from best epoch ... 2023-10-15 15:52:54,374 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-15 15:53:11,590 Results: - F-score (micro) 0.4759 - F-score (macro) 0.3146 - Accuracy 0.3163 By class: precision recall f1-score support LOC 0.5790 0.5947 0.5868 1214 PER 0.4096 0.4121 0.4109 808 ORG 0.2527 0.2691 0.2606 353 HumanProd 0.0000 0.0000 0.0000 15 micro avg 0.4707 0.4812 0.4759 2390 macro avg 0.3103 0.3190 0.3146 2390 weighted avg 0.4699 0.4812 0.4754 2390 2023-10-15 15:53:11,590 ----------------------------------------------------------------------------------------------------