2023-10-13 13:36:06,213 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:36:06,214 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(32001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=21, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 13:36:06,214 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:36:06,214 MultiCorpus: 3575 train + 1235 dev + 1266 test sentences - NER_HIPE_2022 Corpus: 3575 train + 1235 dev + 1266 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/de/with_doc_seperator 2023-10-13 13:36:06,214 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:36:06,214 Train: 3575 sentences 2023-10-13 13:36:06,214 (train_with_dev=False, train_with_test=False) 2023-10-13 13:36:06,214 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:36:06,214 Training Params: 2023-10-13 13:36:06,214 - learning_rate: "5e-05" 2023-10-13 13:36:06,214 - mini_batch_size: "4" 2023-10-13 13:36:06,215 - max_epochs: "10" 2023-10-13 13:36:06,215 - shuffle: "True" 2023-10-13 13:36:06,215 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:36:06,215 Plugins: 2023-10-13 13:36:06,215 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 13:36:06,215 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:36:06,215 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 13:36:06,215 - metric: "('micro avg', 'f1-score')" 2023-10-13 13:36:06,215 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:36:06,215 Computation: 2023-10-13 13:36:06,215 - compute on device: cuda:0 2023-10-13 13:36:06,215 - embedding storage: none 2023-10-13 13:36:06,215 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:36:06,215 Model training base path: "hmbench-hipe2020/de-dbmdz/bert-base-historic-multilingual-cased-bs4-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-4" 2023-10-13 13:36:06,215 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:36:06,215 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:36:10,938 epoch 1 - iter 89/894 - loss 2.68748942 - time (sec): 4.72 - samples/sec: 1872.42 - lr: 0.000005 - momentum: 0.000000 2023-10-13 13:36:15,495 epoch 1 - iter 178/894 - loss 1.59012845 - time (sec): 9.28 - samples/sec: 1947.09 - lr: 0.000010 - momentum: 0.000000 2023-10-13 13:36:19,942 epoch 1 - iter 267/894 - loss 1.22746441 - time (sec): 13.73 - samples/sec: 1893.71 - lr: 0.000015 - momentum: 0.000000 2023-10-13 13:36:24,240 epoch 1 - iter 356/894 - loss 0.99377093 - time (sec): 18.02 - samples/sec: 1937.51 - lr: 0.000020 - momentum: 0.000000 2023-10-13 13:36:28,368 epoch 1 - iter 445/894 - loss 0.85393894 - time (sec): 22.15 - samples/sec: 1951.39 - lr: 0.000025 - momentum: 0.000000 2023-10-13 13:36:32,638 epoch 1 - iter 534/894 - loss 0.76142709 - time (sec): 26.42 - samples/sec: 1955.45 - lr: 0.000030 - momentum: 0.000000 2023-10-13 13:36:36,984 epoch 1 - iter 623/894 - loss 0.69553409 - time (sec): 30.77 - samples/sec: 1945.95 - lr: 0.000035 - momentum: 0.000000 2023-10-13 13:36:41,113 epoch 1 - iter 712/894 - loss 0.64037032 - time (sec): 34.90 - samples/sec: 1961.99 - lr: 0.000040 - momentum: 0.000000 2023-10-13 13:36:45,296 epoch 1 - iter 801/894 - loss 0.59367766 - time (sec): 39.08 - samples/sec: 1957.13 - lr: 0.000045 - momentum: 0.000000 2023-10-13 13:36:49,755 epoch 1 - iter 890/894 - loss 0.55614722 - time (sec): 43.54 - samples/sec: 1979.46 - lr: 0.000050 - momentum: 0.000000 2023-10-13 13:36:49,955 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:36:49,955 EPOCH 1 done: loss 0.5541 - lr: 0.000050 2023-10-13 13:36:54,994 DEV : loss 0.25372514128685 - f1-score (micro avg) 0.5462 2023-10-13 13:36:55,024 saving best model 2023-10-13 13:36:55,386 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:36:59,708 epoch 2 - iter 89/894 - loss 0.17468129 - time (sec): 4.32 - samples/sec: 2088.65 - lr: 0.000049 - momentum: 0.000000 2023-10-13 13:37:04,099 epoch 2 - iter 178/894 - loss 0.17014545 - time (sec): 8.71 - samples/sec: 2135.36 - lr: 0.000049 - momentum: 0.000000 2023-10-13 13:37:08,087 epoch 2 - iter 267/894 - loss 0.16084483 - time (sec): 12.70 - samples/sec: 2087.62 - lr: 0.000048 - momentum: 0.000000 2023-10-13 13:37:12,211 epoch 2 - iter 356/894 - loss 0.16855910 - time (sec): 16.82 - samples/sec: 2070.57 - lr: 0.000048 - momentum: 0.000000 2023-10-13 13:37:16,385 epoch 2 - iter 445/894 - loss 0.16531549 - time (sec): 21.00 - samples/sec: 2077.00 - lr: 0.000047 - momentum: 0.000000 2023-10-13 13:37:20,503 epoch 2 - iter 534/894 - loss 0.15775841 - time (sec): 25.11 - samples/sec: 2082.98 - lr: 0.000047 - momentum: 0.000000 2023-10-13 13:37:24,683 epoch 2 - iter 623/894 - loss 0.15772402 - time (sec): 29.30 - samples/sec: 2063.56 - lr: 0.000046 - momentum: 0.000000 2023-10-13 13:37:28,927 epoch 2 - iter 712/894 - loss 0.15883528 - time (sec): 33.54 - samples/sec: 2046.72 - lr: 0.000046 - momentum: 0.000000 2023-10-13 13:37:33,333 epoch 2 - iter 801/894 - loss 0.15729564 - time (sec): 37.95 - samples/sec: 2042.63 - lr: 0.000045 - momentum: 0.000000 2023-10-13 13:37:37,353 epoch 2 - iter 890/894 - loss 0.15659491 - time (sec): 41.97 - samples/sec: 2054.37 - lr: 0.000044 - momentum: 0.000000 2023-10-13 13:37:37,536 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:37:37,536 EPOCH 2 done: loss 0.1565 - lr: 0.000044 2023-10-13 13:37:46,225 DEV : loss 0.146720752120018 - f1-score (micro avg) 0.6869 2023-10-13 13:37:46,253 saving best model 2023-10-13 13:37:46,641 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:37:50,828 epoch 3 - iter 89/894 - loss 0.11101622 - time (sec): 4.19 - samples/sec: 2071.00 - lr: 0.000044 - momentum: 0.000000 2023-10-13 13:37:55,412 epoch 3 - iter 178/894 - loss 0.10493274 - time (sec): 8.77 - samples/sec: 2056.15 - lr: 0.000043 - momentum: 0.000000 2023-10-13 13:37:59,662 epoch 3 - iter 267/894 - loss 0.10198121 - time (sec): 13.02 - samples/sec: 2031.68 - lr: 0.000043 - momentum: 0.000000 2023-10-13 13:38:03,763 epoch 3 - iter 356/894 - loss 0.10072889 - time (sec): 17.12 - samples/sec: 2031.21 - lr: 0.000042 - momentum: 0.000000 2023-10-13 13:38:07,891 epoch 3 - iter 445/894 - loss 0.10004708 - time (sec): 21.25 - samples/sec: 2003.26 - lr: 0.000042 - momentum: 0.000000 2023-10-13 13:38:11,984 epoch 3 - iter 534/894 - loss 0.09936209 - time (sec): 25.34 - samples/sec: 2018.71 - lr: 0.000041 - momentum: 0.000000 2023-10-13 13:38:16,023 epoch 3 - iter 623/894 - loss 0.09825369 - time (sec): 29.38 - samples/sec: 2020.23 - lr: 0.000041 - momentum: 0.000000 2023-10-13 13:38:20,434 epoch 3 - iter 712/894 - loss 0.09475486 - time (sec): 33.79 - samples/sec: 2027.08 - lr: 0.000040 - momentum: 0.000000 2023-10-13 13:38:24,656 epoch 3 - iter 801/894 - loss 0.09884765 - time (sec): 38.01 - samples/sec: 2018.81 - lr: 0.000039 - momentum: 0.000000 2023-10-13 13:38:28,994 epoch 3 - iter 890/894 - loss 0.09665535 - time (sec): 42.35 - samples/sec: 2034.77 - lr: 0.000039 - momentum: 0.000000 2023-10-13 13:38:29,183 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:38:29,183 EPOCH 3 done: loss 0.0966 - lr: 0.000039 2023-10-13 13:38:38,013 DEV : loss 0.1882152259349823 - f1-score (micro avg) 0.7134 2023-10-13 13:38:38,040 saving best model 2023-10-13 13:38:38,468 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:38:42,765 epoch 4 - iter 89/894 - loss 0.05521601 - time (sec): 4.29 - samples/sec: 2010.24 - lr: 0.000038 - momentum: 0.000000 2023-10-13 13:38:46,774 epoch 4 - iter 178/894 - loss 0.06025113 - time (sec): 8.30 - samples/sec: 2000.44 - lr: 0.000038 - momentum: 0.000000 2023-10-13 13:38:50,728 epoch 4 - iter 267/894 - loss 0.06246618 - time (sec): 12.25 - samples/sec: 2061.29 - lr: 0.000037 - momentum: 0.000000 2023-10-13 13:38:54,961 epoch 4 - iter 356/894 - loss 0.05713263 - time (sec): 16.49 - samples/sec: 2066.43 - lr: 0.000037 - momentum: 0.000000 2023-10-13 13:38:59,367 epoch 4 - iter 445/894 - loss 0.05854806 - time (sec): 20.89 - samples/sec: 2093.48 - lr: 0.000036 - momentum: 0.000000 2023-10-13 13:39:03,669 epoch 4 - iter 534/894 - loss 0.05727048 - time (sec): 25.19 - samples/sec: 2083.68 - lr: 0.000036 - momentum: 0.000000 2023-10-13 13:39:07,689 epoch 4 - iter 623/894 - loss 0.05943191 - time (sec): 29.21 - samples/sec: 2067.43 - lr: 0.000035 - momentum: 0.000000 2023-10-13 13:39:11,950 epoch 4 - iter 712/894 - loss 0.05928586 - time (sec): 33.48 - samples/sec: 2074.11 - lr: 0.000034 - momentum: 0.000000 2023-10-13 13:39:16,241 epoch 4 - iter 801/894 - loss 0.06054469 - time (sec): 37.77 - samples/sec: 2068.73 - lr: 0.000034 - momentum: 0.000000 2023-10-13 13:39:20,667 epoch 4 - iter 890/894 - loss 0.05991357 - time (sec): 42.19 - samples/sec: 2042.27 - lr: 0.000033 - momentum: 0.000000 2023-10-13 13:39:20,844 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:39:20,844 EPOCH 4 done: loss 0.0601 - lr: 0.000033 2023-10-13 13:39:29,758 DEV : loss 0.19933967292308807 - f1-score (micro avg) 0.7387 2023-10-13 13:39:29,792 saving best model 2023-10-13 13:39:30,297 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:39:34,666 epoch 5 - iter 89/894 - loss 0.05581229 - time (sec): 4.37 - samples/sec: 2074.31 - lr: 0.000033 - momentum: 0.000000 2023-10-13 13:39:39,035 epoch 5 - iter 178/894 - loss 0.04722653 - time (sec): 8.74 - samples/sec: 1991.98 - lr: 0.000032 - momentum: 0.000000 2023-10-13 13:39:43,612 epoch 5 - iter 267/894 - loss 0.04585486 - time (sec): 13.31 - samples/sec: 1974.54 - lr: 0.000032 - momentum: 0.000000 2023-10-13 13:39:47,897 epoch 5 - iter 356/894 - loss 0.04522439 - time (sec): 17.60 - samples/sec: 1991.17 - lr: 0.000031 - momentum: 0.000000 2023-10-13 13:39:52,087 epoch 5 - iter 445/894 - loss 0.04771289 - time (sec): 21.79 - samples/sec: 1986.95 - lr: 0.000031 - momentum: 0.000000 2023-10-13 13:39:56,379 epoch 5 - iter 534/894 - loss 0.04810893 - time (sec): 26.08 - samples/sec: 1984.66 - lr: 0.000030 - momentum: 0.000000 2023-10-13 13:40:00,872 epoch 5 - iter 623/894 - loss 0.04557853 - time (sec): 30.57 - samples/sec: 2003.60 - lr: 0.000029 - momentum: 0.000000 2023-10-13 13:40:05,100 epoch 5 - iter 712/894 - loss 0.04568690 - time (sec): 34.80 - samples/sec: 1991.16 - lr: 0.000029 - momentum: 0.000000 2023-10-13 13:40:09,267 epoch 5 - iter 801/894 - loss 0.04528489 - time (sec): 38.97 - samples/sec: 1993.82 - lr: 0.000028 - momentum: 0.000000 2023-10-13 13:40:13,313 epoch 5 - iter 890/894 - loss 0.04320919 - time (sec): 43.01 - samples/sec: 2005.20 - lr: 0.000028 - momentum: 0.000000 2023-10-13 13:40:13,482 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:40:13,482 EPOCH 5 done: loss 0.0432 - lr: 0.000028 2023-10-13 13:40:22,114 DEV : loss 0.22585448622703552 - f1-score (micro avg) 0.7412 2023-10-13 13:40:22,142 saving best model 2023-10-13 13:40:22,609 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:40:27,061 epoch 6 - iter 89/894 - loss 0.02085790 - time (sec): 4.45 - samples/sec: 1968.09 - lr: 0.000027 - momentum: 0.000000 2023-10-13 13:40:31,364 epoch 6 - iter 178/894 - loss 0.02817642 - time (sec): 8.75 - samples/sec: 2031.27 - lr: 0.000027 - momentum: 0.000000 2023-10-13 13:40:35,396 epoch 6 - iter 267/894 - loss 0.02565474 - time (sec): 12.79 - samples/sec: 2051.52 - lr: 0.000026 - momentum: 0.000000 2023-10-13 13:40:39,892 epoch 6 - iter 356/894 - loss 0.02217424 - time (sec): 17.28 - samples/sec: 2095.95 - lr: 0.000026 - momentum: 0.000000 2023-10-13 13:40:44,068 epoch 6 - iter 445/894 - loss 0.02178014 - time (sec): 21.46 - samples/sec: 2037.88 - lr: 0.000025 - momentum: 0.000000 2023-10-13 13:40:48,065 epoch 6 - iter 534/894 - loss 0.02116497 - time (sec): 25.45 - samples/sec: 2044.48 - lr: 0.000024 - momentum: 0.000000 2023-10-13 13:40:52,279 epoch 6 - iter 623/894 - loss 0.02285682 - time (sec): 29.67 - samples/sec: 2040.62 - lr: 0.000024 - momentum: 0.000000 2023-10-13 13:40:56,435 epoch 6 - iter 712/894 - loss 0.02393454 - time (sec): 33.82 - samples/sec: 2029.42 - lr: 0.000023 - momentum: 0.000000 2023-10-13 13:41:00,740 epoch 6 - iter 801/894 - loss 0.02490852 - time (sec): 38.13 - samples/sec: 2034.66 - lr: 0.000023 - momentum: 0.000000 2023-10-13 13:41:04,960 epoch 6 - iter 890/894 - loss 0.02709514 - time (sec): 42.35 - samples/sec: 2036.01 - lr: 0.000022 - momentum: 0.000000 2023-10-13 13:41:05,130 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:41:05,130 EPOCH 6 done: loss 0.0271 - lr: 0.000022 2023-10-13 13:41:13,883 DEV : loss 0.20660947263240814 - f1-score (micro avg) 0.7692 2023-10-13 13:41:13,912 saving best model 2023-10-13 13:41:14,371 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:41:18,988 epoch 7 - iter 89/894 - loss 0.01441146 - time (sec): 4.61 - samples/sec: 2170.14 - lr: 0.000022 - momentum: 0.000000 2023-10-13 13:41:23,142 epoch 7 - iter 178/894 - loss 0.01336769 - time (sec): 8.77 - samples/sec: 2064.61 - lr: 0.000021 - momentum: 0.000000 2023-10-13 13:41:27,267 epoch 7 - iter 267/894 - loss 0.01184471 - time (sec): 12.89 - samples/sec: 2090.67 - lr: 0.000021 - momentum: 0.000000 2023-10-13 13:41:31,390 epoch 7 - iter 356/894 - loss 0.01381643 - time (sec): 17.02 - samples/sec: 2108.61 - lr: 0.000020 - momentum: 0.000000 2023-10-13 13:41:35,581 epoch 7 - iter 445/894 - loss 0.01343301 - time (sec): 21.21 - samples/sec: 2094.77 - lr: 0.000019 - momentum: 0.000000 2023-10-13 13:41:39,673 epoch 7 - iter 534/894 - loss 0.01496204 - time (sec): 25.30 - samples/sec: 2059.50 - lr: 0.000019 - momentum: 0.000000 2023-10-13 13:41:43,879 epoch 7 - iter 623/894 - loss 0.01637548 - time (sec): 29.51 - samples/sec: 2062.67 - lr: 0.000018 - momentum: 0.000000 2023-10-13 13:41:48,037 epoch 7 - iter 712/894 - loss 0.01538496 - time (sec): 33.66 - samples/sec: 2054.63 - lr: 0.000018 - momentum: 0.000000 2023-10-13 13:41:52,153 epoch 7 - iter 801/894 - loss 0.01717510 - time (sec): 37.78 - samples/sec: 2051.43 - lr: 0.000017 - momentum: 0.000000 2023-10-13 13:41:56,385 epoch 7 - iter 890/894 - loss 0.01693132 - time (sec): 42.01 - samples/sec: 2051.71 - lr: 0.000017 - momentum: 0.000000 2023-10-13 13:41:56,565 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:41:56,565 EPOCH 7 done: loss 0.0169 - lr: 0.000017 2023-10-13 13:42:05,324 DEV : loss 0.2536468803882599 - f1-score (micro avg) 0.7769 2023-10-13 13:42:05,352 saving best model 2023-10-13 13:42:05,779 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:42:10,163 epoch 8 - iter 89/894 - loss 0.00786799 - time (sec): 4.38 - samples/sec: 1996.79 - lr: 0.000016 - momentum: 0.000000 2023-10-13 13:42:14,442 epoch 8 - iter 178/894 - loss 0.00866875 - time (sec): 8.66 - samples/sec: 1994.51 - lr: 0.000016 - momentum: 0.000000 2023-10-13 13:42:18,591 epoch 8 - iter 267/894 - loss 0.01092078 - time (sec): 12.81 - samples/sec: 2002.69 - lr: 0.000015 - momentum: 0.000000 2023-10-13 13:42:22,760 epoch 8 - iter 356/894 - loss 0.01198771 - time (sec): 16.98 - samples/sec: 1994.23 - lr: 0.000014 - momentum: 0.000000 2023-10-13 13:42:26,964 epoch 8 - iter 445/894 - loss 0.01138557 - time (sec): 21.18 - samples/sec: 1994.40 - lr: 0.000014 - momentum: 0.000000 2023-10-13 13:42:31,193 epoch 8 - iter 534/894 - loss 0.01099923 - time (sec): 25.41 - samples/sec: 1991.11 - lr: 0.000013 - momentum: 0.000000 2023-10-13 13:42:35,348 epoch 8 - iter 623/894 - loss 0.01067929 - time (sec): 29.57 - samples/sec: 1999.99 - lr: 0.000013 - momentum: 0.000000 2023-10-13 13:42:39,827 epoch 8 - iter 712/894 - loss 0.01143117 - time (sec): 34.05 - samples/sec: 2006.75 - lr: 0.000012 - momentum: 0.000000 2023-10-13 13:42:44,083 epoch 8 - iter 801/894 - loss 0.01122106 - time (sec): 38.30 - samples/sec: 2025.04 - lr: 0.000012 - momentum: 0.000000 2023-10-13 13:42:48,301 epoch 8 - iter 890/894 - loss 0.01097617 - time (sec): 42.52 - samples/sec: 2029.05 - lr: 0.000011 - momentum: 0.000000 2023-10-13 13:42:48,491 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:42:48,492 EPOCH 8 done: loss 0.0109 - lr: 0.000011 2023-10-13 13:42:57,409 DEV : loss 0.25275343656539917 - f1-score (micro avg) 0.7708 2023-10-13 13:42:57,440 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:43:02,157 epoch 9 - iter 89/894 - loss 0.00495732 - time (sec): 4.72 - samples/sec: 1777.88 - lr: 0.000011 - momentum: 0.000000 2023-10-13 13:43:06,477 epoch 9 - iter 178/894 - loss 0.00486139 - time (sec): 9.04 - samples/sec: 1901.16 - lr: 0.000010 - momentum: 0.000000 2023-10-13 13:43:10,501 epoch 9 - iter 267/894 - loss 0.00630451 - time (sec): 13.06 - samples/sec: 1931.23 - lr: 0.000009 - momentum: 0.000000 2023-10-13 13:43:14,571 epoch 9 - iter 356/894 - loss 0.00739423 - time (sec): 17.13 - samples/sec: 1957.10 - lr: 0.000009 - momentum: 0.000000 2023-10-13 13:43:18,711 epoch 9 - iter 445/894 - loss 0.00718639 - time (sec): 21.27 - samples/sec: 2000.40 - lr: 0.000008 - momentum: 0.000000 2023-10-13 13:43:22,820 epoch 9 - iter 534/894 - loss 0.00649298 - time (sec): 25.38 - samples/sec: 2010.27 - lr: 0.000008 - momentum: 0.000000 2023-10-13 13:43:26,935 epoch 9 - iter 623/894 - loss 0.00722465 - time (sec): 29.49 - samples/sec: 2039.50 - lr: 0.000007 - momentum: 0.000000 2023-10-13 13:43:31,387 epoch 9 - iter 712/894 - loss 0.00773812 - time (sec): 33.95 - samples/sec: 2060.28 - lr: 0.000007 - momentum: 0.000000 2023-10-13 13:43:35,413 epoch 9 - iter 801/894 - loss 0.00726079 - time (sec): 37.97 - samples/sec: 2056.51 - lr: 0.000006 - momentum: 0.000000 2023-10-13 13:43:39,402 epoch 9 - iter 890/894 - loss 0.00690011 - time (sec): 41.96 - samples/sec: 2056.85 - lr: 0.000006 - momentum: 0.000000 2023-10-13 13:43:39,576 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:43:39,576 EPOCH 9 done: loss 0.0069 - lr: 0.000006 2023-10-13 13:43:48,273 DEV : loss 0.2594471275806427 - f1-score (micro avg) 0.7778 2023-10-13 13:43:48,300 saving best model 2023-10-13 13:43:48,726 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:43:53,074 epoch 10 - iter 89/894 - loss 0.00215822 - time (sec): 4.35 - samples/sec: 2283.09 - lr: 0.000005 - momentum: 0.000000 2023-10-13 13:43:57,281 epoch 10 - iter 178/894 - loss 0.00389851 - time (sec): 8.55 - samples/sec: 2190.38 - lr: 0.000004 - momentum: 0.000000 2023-10-13 13:44:01,325 epoch 10 - iter 267/894 - loss 0.00350003 - time (sec): 12.60 - samples/sec: 2146.14 - lr: 0.000004 - momentum: 0.000000 2023-10-13 13:44:05,445 epoch 10 - iter 356/894 - loss 0.00399930 - time (sec): 16.72 - samples/sec: 2102.62 - lr: 0.000003 - momentum: 0.000000 2023-10-13 13:44:09,795 epoch 10 - iter 445/894 - loss 0.00340014 - time (sec): 21.07 - samples/sec: 2080.76 - lr: 0.000003 - momentum: 0.000000 2023-10-13 13:44:14,542 epoch 10 - iter 534/894 - loss 0.00358239 - time (sec): 25.82 - samples/sec: 2011.27 - lr: 0.000002 - momentum: 0.000000 2023-10-13 13:44:19,279 epoch 10 - iter 623/894 - loss 0.00371600 - time (sec): 30.55 - samples/sec: 1968.33 - lr: 0.000002 - momentum: 0.000000 2023-10-13 13:44:23,736 epoch 10 - iter 712/894 - loss 0.00366171 - time (sec): 35.01 - samples/sec: 1968.68 - lr: 0.000001 - momentum: 0.000000 2023-10-13 13:44:28,034 epoch 10 - iter 801/894 - loss 0.00355658 - time (sec): 39.31 - samples/sec: 1972.71 - lr: 0.000001 - momentum: 0.000000 2023-10-13 13:44:32,317 epoch 10 - iter 890/894 - loss 0.00415920 - time (sec): 43.59 - samples/sec: 1977.84 - lr: 0.000000 - momentum: 0.000000 2023-10-13 13:44:32,511 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:44:32,511 EPOCH 10 done: loss 0.0041 - lr: 0.000000 2023-10-13 13:44:41,095 DEV : loss 0.26341870427131653 - f1-score (micro avg) 0.7824 2023-10-13 13:44:41,123 saving best model 2023-10-13 13:44:41,945 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:44:41,946 Loading model from best epoch ... 2023-10-13 13:44:43,435 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-prod, B-prod, E-prod, I-prod, S-time, B-time, E-time, I-time 2023-10-13 13:44:48,077 Results: - F-score (micro) 0.7298 - F-score (macro) 0.658 - Accuracy 0.5973 By class: precision recall f1-score support loc 0.8023 0.8238 0.8129 596 pers 0.6329 0.7508 0.6868 333 org 0.5833 0.5303 0.5556 132 prod 0.6087 0.4242 0.5000 66 time 0.7347 0.7347 0.7347 49 micro avg 0.7160 0.7440 0.7298 1176 macro avg 0.6724 0.6528 0.6580 1176 weighted avg 0.7161 0.7440 0.7275 1176 2023-10-13 13:44:48,078 ----------------------------------------------------------------------------------------------------