2023-10-25 21:05:36,363 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:05:36,364 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(64001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-25 21:05:36,364 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:05:36,364 MultiCorpus: 1166 train + 165 dev + 415 test sentences - NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator 2023-10-25 21:05:36,364 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:05:36,364 Train: 1166 sentences 2023-10-25 21:05:36,364 (train_with_dev=False, train_with_test=False) 2023-10-25 21:05:36,364 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:05:36,364 Training Params: 2023-10-25 21:05:36,364 - learning_rate: "3e-05" 2023-10-25 21:05:36,364 - mini_batch_size: "8" 2023-10-25 21:05:36,364 - max_epochs: "10" 2023-10-25 21:05:36,364 - shuffle: "True" 2023-10-25 21:05:36,364 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:05:36,364 Plugins: 2023-10-25 21:05:36,364 - TensorboardLogger 2023-10-25 21:05:36,364 - LinearScheduler | warmup_fraction: '0.1' 2023-10-25 21:05:36,364 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:05:36,364 Final evaluation on model from best epoch (best-model.pt) 2023-10-25 21:05:36,364 - metric: "('micro avg', 'f1-score')" 2023-10-25 21:05:36,364 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:05:36,365 Computation: 2023-10-25 21:05:36,365 - compute on device: cuda:0 2023-10-25 21:05:36,365 - embedding storage: none 2023-10-25 21:05:36,365 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:05:36,365 Model training base path: "hmbench-newseye/fi-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-3" 2023-10-25 21:05:36,365 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:05:36,365 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:05:36,365 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-25 21:05:37,164 epoch 1 - iter 14/146 - loss 2.83025878 - time (sec): 0.80 - samples/sec: 4267.88 - lr: 0.000003 - momentum: 0.000000 2023-10-25 21:05:37,955 epoch 1 - iter 28/146 - loss 2.46691922 - time (sec): 1.59 - samples/sec: 4479.04 - lr: 0.000006 - momentum: 0.000000 2023-10-25 21:05:38,902 epoch 1 - iter 42/146 - loss 1.86829011 - time (sec): 2.54 - samples/sec: 4669.08 - lr: 0.000008 - momentum: 0.000000 2023-10-25 21:05:39,724 epoch 1 - iter 56/146 - loss 1.54384758 - time (sec): 3.36 - samples/sec: 4663.76 - lr: 0.000011 - momentum: 0.000000 2023-10-25 21:05:40,514 epoch 1 - iter 70/146 - loss 1.34624853 - time (sec): 4.15 - samples/sec: 4705.75 - lr: 0.000014 - momentum: 0.000000 2023-10-25 21:05:41,496 epoch 1 - iter 84/146 - loss 1.20137129 - time (sec): 5.13 - samples/sec: 4668.86 - lr: 0.000017 - momentum: 0.000000 2023-10-25 21:05:42,434 epoch 1 - iter 98/146 - loss 1.07011610 - time (sec): 6.07 - samples/sec: 4743.31 - lr: 0.000020 - momentum: 0.000000 2023-10-25 21:05:43,498 epoch 1 - iter 112/146 - loss 0.97878778 - time (sec): 7.13 - samples/sec: 4737.41 - lr: 0.000023 - momentum: 0.000000 2023-10-25 21:05:44,331 epoch 1 - iter 126/146 - loss 0.90038816 - time (sec): 7.96 - samples/sec: 4778.42 - lr: 0.000026 - momentum: 0.000000 2023-10-25 21:05:45,275 epoch 1 - iter 140/146 - loss 0.82667252 - time (sec): 8.91 - samples/sec: 4803.31 - lr: 0.000029 - momentum: 0.000000 2023-10-25 21:05:45,697 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:05:45,697 EPOCH 1 done: loss 0.8075 - lr: 0.000029 2023-10-25 21:05:46,358 DEV : loss 0.17039310932159424 - f1-score (micro avg) 0.5702 2023-10-25 21:05:46,362 saving best model 2023-10-25 21:05:46,879 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:05:47,843 epoch 2 - iter 14/146 - loss 0.20229098 - time (sec): 0.96 - samples/sec: 4761.19 - lr: 0.000030 - momentum: 0.000000 2023-10-25 21:05:48,858 epoch 2 - iter 28/146 - loss 0.18766990 - time (sec): 1.98 - samples/sec: 4776.95 - lr: 0.000029 - momentum: 0.000000 2023-10-25 21:05:49,755 epoch 2 - iter 42/146 - loss 0.18648775 - time (sec): 2.88 - samples/sec: 4775.20 - lr: 0.000029 - momentum: 0.000000 2023-10-25 21:05:50,655 epoch 2 - iter 56/146 - loss 0.19029101 - time (sec): 3.77 - samples/sec: 4741.42 - lr: 0.000029 - momentum: 0.000000 2023-10-25 21:05:51,431 epoch 2 - iter 70/146 - loss 0.18913930 - time (sec): 4.55 - samples/sec: 4763.79 - lr: 0.000028 - momentum: 0.000000 2023-10-25 21:05:52,200 epoch 2 - iter 84/146 - loss 0.19230771 - time (sec): 5.32 - samples/sec: 4757.70 - lr: 0.000028 - momentum: 0.000000 2023-10-25 21:05:53,030 epoch 2 - iter 98/146 - loss 0.18747787 - time (sec): 6.15 - samples/sec: 4744.20 - lr: 0.000028 - momentum: 0.000000 2023-10-25 21:05:54,055 epoch 2 - iter 112/146 - loss 0.18004954 - time (sec): 7.18 - samples/sec: 4732.79 - lr: 0.000027 - momentum: 0.000000 2023-10-25 21:05:54,915 epoch 2 - iter 126/146 - loss 0.17380432 - time (sec): 8.03 - samples/sec: 4783.21 - lr: 0.000027 - momentum: 0.000000 2023-10-25 21:05:55,761 epoch 2 - iter 140/146 - loss 0.17394433 - time (sec): 8.88 - samples/sec: 4827.33 - lr: 0.000027 - momentum: 0.000000 2023-10-25 21:05:56,111 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:05:56,111 EPOCH 2 done: loss 0.1735 - lr: 0.000027 2023-10-25 21:05:57,015 DEV : loss 0.10457519441843033 - f1-score (micro avg) 0.7177 2023-10-25 21:05:57,019 saving best model 2023-10-25 21:05:57,704 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:05:58,621 epoch 3 - iter 14/146 - loss 0.09931197 - time (sec): 0.91 - samples/sec: 4567.22 - lr: 0.000026 - momentum: 0.000000 2023-10-25 21:05:59,407 epoch 3 - iter 28/146 - loss 0.09405829 - time (sec): 1.70 - samples/sec: 4431.34 - lr: 0.000026 - momentum: 0.000000 2023-10-25 21:06:00,370 epoch 3 - iter 42/146 - loss 0.08962058 - time (sec): 2.66 - samples/sec: 4500.54 - lr: 0.000026 - momentum: 0.000000 2023-10-25 21:06:01,259 epoch 3 - iter 56/146 - loss 0.08813612 - time (sec): 3.55 - samples/sec: 4372.04 - lr: 0.000025 - momentum: 0.000000 2023-10-25 21:06:02,395 epoch 3 - iter 70/146 - loss 0.09225973 - time (sec): 4.69 - samples/sec: 4529.05 - lr: 0.000025 - momentum: 0.000000 2023-10-25 21:06:03,276 epoch 3 - iter 84/146 - loss 0.09278810 - time (sec): 5.57 - samples/sec: 4642.69 - lr: 0.000025 - momentum: 0.000000 2023-10-25 21:06:04,165 epoch 3 - iter 98/146 - loss 0.09139854 - time (sec): 6.46 - samples/sec: 4698.69 - lr: 0.000024 - momentum: 0.000000 2023-10-25 21:06:04,901 epoch 3 - iter 112/146 - loss 0.09379452 - time (sec): 7.19 - samples/sec: 4736.93 - lr: 0.000024 - momentum: 0.000000 2023-10-25 21:06:05,723 epoch 3 - iter 126/146 - loss 0.09468401 - time (sec): 8.02 - samples/sec: 4729.48 - lr: 0.000024 - momentum: 0.000000 2023-10-25 21:06:06,622 epoch 3 - iter 140/146 - loss 0.09287278 - time (sec): 8.91 - samples/sec: 4744.25 - lr: 0.000024 - momentum: 0.000000 2023-10-25 21:06:07,059 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:06:07,060 EPOCH 3 done: loss 0.0934 - lr: 0.000024 2023-10-25 21:06:08,132 DEV : loss 0.09595039486885071 - f1-score (micro avg) 0.7332 2023-10-25 21:06:08,137 saving best model 2023-10-25 21:06:08,810 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:06:09,790 epoch 4 - iter 14/146 - loss 0.07411911 - time (sec): 0.98 - samples/sec: 5160.10 - lr: 0.000023 - momentum: 0.000000 2023-10-25 21:06:10,610 epoch 4 - iter 28/146 - loss 0.06840387 - time (sec): 1.80 - samples/sec: 4874.67 - lr: 0.000023 - momentum: 0.000000 2023-10-25 21:06:11,437 epoch 4 - iter 42/146 - loss 0.07292162 - time (sec): 2.62 - samples/sec: 4818.23 - lr: 0.000022 - momentum: 0.000000 2023-10-25 21:06:12,396 epoch 4 - iter 56/146 - loss 0.06469956 - time (sec): 3.58 - samples/sec: 4740.65 - lr: 0.000022 - momentum: 0.000000 2023-10-25 21:06:13,143 epoch 4 - iter 70/146 - loss 0.06518597 - time (sec): 4.33 - samples/sec: 4699.77 - lr: 0.000022 - momentum: 0.000000 2023-10-25 21:06:14,098 epoch 4 - iter 84/146 - loss 0.06696645 - time (sec): 5.29 - samples/sec: 4659.86 - lr: 0.000021 - momentum: 0.000000 2023-10-25 21:06:14,934 epoch 4 - iter 98/146 - loss 0.06661296 - time (sec): 6.12 - samples/sec: 4711.64 - lr: 0.000021 - momentum: 0.000000 2023-10-25 21:06:15,737 epoch 4 - iter 112/146 - loss 0.06367292 - time (sec): 6.92 - samples/sec: 4697.66 - lr: 0.000021 - momentum: 0.000000 2023-10-25 21:06:16,739 epoch 4 - iter 126/146 - loss 0.06160560 - time (sec): 7.93 - samples/sec: 4707.84 - lr: 0.000021 - momentum: 0.000000 2023-10-25 21:06:17,616 epoch 4 - iter 140/146 - loss 0.06032160 - time (sec): 8.80 - samples/sec: 4821.21 - lr: 0.000020 - momentum: 0.000000 2023-10-25 21:06:17,933 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:06:17,933 EPOCH 4 done: loss 0.0601 - lr: 0.000020 2023-10-25 21:06:18,846 DEV : loss 0.10524275153875351 - f1-score (micro avg) 0.7642 2023-10-25 21:06:18,850 saving best model 2023-10-25 21:06:19,534 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:06:20,415 epoch 5 - iter 14/146 - loss 0.02996552 - time (sec): 0.88 - samples/sec: 5280.96 - lr: 0.000020 - momentum: 0.000000 2023-10-25 21:06:21,159 epoch 5 - iter 28/146 - loss 0.03043510 - time (sec): 1.62 - samples/sec: 5157.31 - lr: 0.000019 - momentum: 0.000000 2023-10-25 21:06:22,023 epoch 5 - iter 42/146 - loss 0.03731623 - time (sec): 2.48 - samples/sec: 5263.23 - lr: 0.000019 - momentum: 0.000000 2023-10-25 21:06:22,930 epoch 5 - iter 56/146 - loss 0.03644991 - time (sec): 3.39 - samples/sec: 5056.90 - lr: 0.000019 - momentum: 0.000000 2023-10-25 21:06:23,888 epoch 5 - iter 70/146 - loss 0.03446408 - time (sec): 4.35 - samples/sec: 4852.99 - lr: 0.000018 - momentum: 0.000000 2023-10-25 21:06:24,758 epoch 5 - iter 84/146 - loss 0.03395159 - time (sec): 5.22 - samples/sec: 4784.45 - lr: 0.000018 - momentum: 0.000000 2023-10-25 21:06:25,718 epoch 5 - iter 98/146 - loss 0.03646640 - time (sec): 6.18 - samples/sec: 4713.73 - lr: 0.000018 - momentum: 0.000000 2023-10-25 21:06:26,607 epoch 5 - iter 112/146 - loss 0.03898602 - time (sec): 7.07 - samples/sec: 4726.24 - lr: 0.000018 - momentum: 0.000000 2023-10-25 21:06:27,672 epoch 5 - iter 126/146 - loss 0.03853003 - time (sec): 8.13 - samples/sec: 4730.88 - lr: 0.000017 - momentum: 0.000000 2023-10-25 21:06:28,445 epoch 5 - iter 140/146 - loss 0.03950165 - time (sec): 8.91 - samples/sec: 4785.32 - lr: 0.000017 - momentum: 0.000000 2023-10-25 21:06:28,835 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:06:28,836 EPOCH 5 done: loss 0.0395 - lr: 0.000017 2023-10-25 21:06:29,746 DEV : loss 0.10796511173248291 - f1-score (micro avg) 0.7617 2023-10-25 21:06:29,751 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:06:30,562 epoch 6 - iter 14/146 - loss 0.02045471 - time (sec): 0.81 - samples/sec: 5141.92 - lr: 0.000016 - momentum: 0.000000 2023-10-25 21:06:31,512 epoch 6 - iter 28/146 - loss 0.02431460 - time (sec): 1.76 - samples/sec: 4759.68 - lr: 0.000016 - momentum: 0.000000 2023-10-25 21:06:32,460 epoch 6 - iter 42/146 - loss 0.02349163 - time (sec): 2.71 - samples/sec: 4864.11 - lr: 0.000016 - momentum: 0.000000 2023-10-25 21:06:33,331 epoch 6 - iter 56/146 - loss 0.02207213 - time (sec): 3.58 - samples/sec: 4826.86 - lr: 0.000015 - momentum: 0.000000 2023-10-25 21:06:34,276 epoch 6 - iter 70/146 - loss 0.02538588 - time (sec): 4.52 - samples/sec: 4762.20 - lr: 0.000015 - momentum: 0.000000 2023-10-25 21:06:35,169 epoch 6 - iter 84/146 - loss 0.02401518 - time (sec): 5.42 - samples/sec: 4691.30 - lr: 0.000015 - momentum: 0.000000 2023-10-25 21:06:36,103 epoch 6 - iter 98/146 - loss 0.02319494 - time (sec): 6.35 - samples/sec: 4741.03 - lr: 0.000015 - momentum: 0.000000 2023-10-25 21:06:37,157 epoch 6 - iter 112/146 - loss 0.02399453 - time (sec): 7.41 - samples/sec: 4643.43 - lr: 0.000014 - momentum: 0.000000 2023-10-25 21:06:38,007 epoch 6 - iter 126/146 - loss 0.02529112 - time (sec): 8.26 - samples/sec: 4654.13 - lr: 0.000014 - momentum: 0.000000 2023-10-25 21:06:38,896 epoch 6 - iter 140/146 - loss 0.02430354 - time (sec): 9.14 - samples/sec: 4677.55 - lr: 0.000014 - momentum: 0.000000 2023-10-25 21:06:39,258 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:06:39,258 EPOCH 6 done: loss 0.0249 - lr: 0.000014 2023-10-25 21:06:40,322 DEV : loss 0.11456426978111267 - f1-score (micro avg) 0.7849 2023-10-25 21:06:40,327 saving best model 2023-10-25 21:06:40,998 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:06:41,831 epoch 7 - iter 14/146 - loss 0.01377810 - time (sec): 0.83 - samples/sec: 5025.09 - lr: 0.000013 - momentum: 0.000000 2023-10-25 21:06:43,030 epoch 7 - iter 28/146 - loss 0.02000949 - time (sec): 2.03 - samples/sec: 4964.90 - lr: 0.000013 - momentum: 0.000000 2023-10-25 21:06:43,818 epoch 7 - iter 42/146 - loss 0.02159133 - time (sec): 2.82 - samples/sec: 4774.67 - lr: 0.000012 - momentum: 0.000000 2023-10-25 21:06:44,680 epoch 7 - iter 56/146 - loss 0.02141280 - time (sec): 3.68 - samples/sec: 4696.42 - lr: 0.000012 - momentum: 0.000000 2023-10-25 21:06:45,530 epoch 7 - iter 70/146 - loss 0.02033340 - time (sec): 4.53 - samples/sec: 4722.77 - lr: 0.000012 - momentum: 0.000000 2023-10-25 21:06:46,479 epoch 7 - iter 84/146 - loss 0.01873330 - time (sec): 5.48 - samples/sec: 4804.21 - lr: 0.000012 - momentum: 0.000000 2023-10-25 21:06:47,374 epoch 7 - iter 98/146 - loss 0.01877254 - time (sec): 6.37 - samples/sec: 4815.74 - lr: 0.000011 - momentum: 0.000000 2023-10-25 21:06:48,168 epoch 7 - iter 112/146 - loss 0.01963305 - time (sec): 7.17 - samples/sec: 4765.05 - lr: 0.000011 - momentum: 0.000000 2023-10-25 21:06:48,984 epoch 7 - iter 126/146 - loss 0.01898464 - time (sec): 7.98 - samples/sec: 4793.92 - lr: 0.000011 - momentum: 0.000000 2023-10-25 21:06:49,883 epoch 7 - iter 140/146 - loss 0.01845447 - time (sec): 8.88 - samples/sec: 4782.28 - lr: 0.000010 - momentum: 0.000000 2023-10-25 21:06:50,299 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:06:50,299 EPOCH 7 done: loss 0.0181 - lr: 0.000010 2023-10-25 21:06:51,208 DEV : loss 0.13842682540416718 - f1-score (micro avg) 0.7788 2023-10-25 21:06:51,213 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:06:52,098 epoch 8 - iter 14/146 - loss 0.02179360 - time (sec): 0.88 - samples/sec: 4362.26 - lr: 0.000010 - momentum: 0.000000 2023-10-25 21:06:53,100 epoch 8 - iter 28/146 - loss 0.01457476 - time (sec): 1.89 - samples/sec: 4502.85 - lr: 0.000009 - momentum: 0.000000 2023-10-25 21:06:54,176 epoch 8 - iter 42/146 - loss 0.01363156 - time (sec): 2.96 - samples/sec: 4560.96 - lr: 0.000009 - momentum: 0.000000 2023-10-25 21:06:55,065 epoch 8 - iter 56/146 - loss 0.01358221 - time (sec): 3.85 - samples/sec: 4529.92 - lr: 0.000009 - momentum: 0.000000 2023-10-25 21:06:55,987 epoch 8 - iter 70/146 - loss 0.01502044 - time (sec): 4.77 - samples/sec: 4552.97 - lr: 0.000009 - momentum: 0.000000 2023-10-25 21:06:56,784 epoch 8 - iter 84/146 - loss 0.01489034 - time (sec): 5.57 - samples/sec: 4658.20 - lr: 0.000008 - momentum: 0.000000 2023-10-25 21:06:57,539 epoch 8 - iter 98/146 - loss 0.01449699 - time (sec): 6.32 - samples/sec: 4659.56 - lr: 0.000008 - momentum: 0.000000 2023-10-25 21:06:58,527 epoch 8 - iter 112/146 - loss 0.01427531 - time (sec): 7.31 - samples/sec: 4717.80 - lr: 0.000008 - momentum: 0.000000 2023-10-25 21:06:59,324 epoch 8 - iter 126/146 - loss 0.01369188 - time (sec): 8.11 - samples/sec: 4719.35 - lr: 0.000007 - momentum: 0.000000 2023-10-25 21:07:00,154 epoch 8 - iter 140/146 - loss 0.01269696 - time (sec): 8.94 - samples/sec: 4804.98 - lr: 0.000007 - momentum: 0.000000 2023-10-25 21:07:00,462 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:07:00,462 EPOCH 8 done: loss 0.0128 - lr: 0.000007 2023-10-25 21:07:01,376 DEV : loss 0.15397675335407257 - f1-score (micro avg) 0.7315 2023-10-25 21:07:01,380 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:07:02,243 epoch 9 - iter 14/146 - loss 0.00725404 - time (sec): 0.86 - samples/sec: 5396.63 - lr: 0.000006 - momentum: 0.000000 2023-10-25 21:07:03,063 epoch 9 - iter 28/146 - loss 0.00858283 - time (sec): 1.68 - samples/sec: 5303.83 - lr: 0.000006 - momentum: 0.000000 2023-10-25 21:07:03,844 epoch 9 - iter 42/146 - loss 0.00697012 - time (sec): 2.46 - samples/sec: 5096.55 - lr: 0.000006 - momentum: 0.000000 2023-10-25 21:07:04,896 epoch 9 - iter 56/146 - loss 0.00943578 - time (sec): 3.51 - samples/sec: 4980.37 - lr: 0.000006 - momentum: 0.000000 2023-10-25 21:07:05,865 epoch 9 - iter 70/146 - loss 0.01141288 - time (sec): 4.48 - samples/sec: 4965.81 - lr: 0.000005 - momentum: 0.000000 2023-10-25 21:07:06,742 epoch 9 - iter 84/146 - loss 0.01134343 - time (sec): 5.36 - samples/sec: 4920.44 - lr: 0.000005 - momentum: 0.000000 2023-10-25 21:07:07,668 epoch 9 - iter 98/146 - loss 0.01085725 - time (sec): 6.29 - samples/sec: 4919.30 - lr: 0.000005 - momentum: 0.000000 2023-10-25 21:07:08,425 epoch 9 - iter 112/146 - loss 0.01140709 - time (sec): 7.04 - samples/sec: 4880.19 - lr: 0.000004 - momentum: 0.000000 2023-10-25 21:07:09,309 epoch 9 - iter 126/146 - loss 0.01114475 - time (sec): 7.93 - samples/sec: 4855.84 - lr: 0.000004 - momentum: 0.000000 2023-10-25 21:07:10,174 epoch 9 - iter 140/146 - loss 0.01073435 - time (sec): 8.79 - samples/sec: 4862.91 - lr: 0.000004 - momentum: 0.000000 2023-10-25 21:07:10,483 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:07:10,483 EPOCH 9 done: loss 0.0109 - lr: 0.000004 2023-10-25 21:07:11,390 DEV : loss 0.15072251856327057 - f1-score (micro avg) 0.7598 2023-10-25 21:07:11,395 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:07:12,200 epoch 10 - iter 14/146 - loss 0.00398947 - time (sec): 0.80 - samples/sec: 5186.37 - lr: 0.000003 - momentum: 0.000000 2023-10-25 21:07:13,172 epoch 10 - iter 28/146 - loss 0.00875898 - time (sec): 1.78 - samples/sec: 4781.14 - lr: 0.000003 - momentum: 0.000000 2023-10-25 21:07:14,022 epoch 10 - iter 42/146 - loss 0.00922190 - time (sec): 2.63 - samples/sec: 4759.26 - lr: 0.000003 - momentum: 0.000000 2023-10-25 21:07:14,955 epoch 10 - iter 56/146 - loss 0.01137857 - time (sec): 3.56 - samples/sec: 4758.01 - lr: 0.000002 - momentum: 0.000000 2023-10-25 21:07:15,813 epoch 10 - iter 70/146 - loss 0.00974451 - time (sec): 4.42 - samples/sec: 4783.44 - lr: 0.000002 - momentum: 0.000000 2023-10-25 21:07:16,574 epoch 10 - iter 84/146 - loss 0.00891124 - time (sec): 5.18 - samples/sec: 4755.93 - lr: 0.000002 - momentum: 0.000000 2023-10-25 21:07:17,576 epoch 10 - iter 98/146 - loss 0.01023278 - time (sec): 6.18 - samples/sec: 4727.60 - lr: 0.000001 - momentum: 0.000000 2023-10-25 21:07:18,489 epoch 10 - iter 112/146 - loss 0.00961012 - time (sec): 7.09 - samples/sec: 4783.79 - lr: 0.000001 - momentum: 0.000000 2023-10-25 21:07:19,586 epoch 10 - iter 126/146 - loss 0.00898474 - time (sec): 8.19 - samples/sec: 4684.43 - lr: 0.000001 - momentum: 0.000000 2023-10-25 21:07:20,438 epoch 10 - iter 140/146 - loss 0.00851157 - time (sec): 9.04 - samples/sec: 4730.83 - lr: 0.000000 - momentum: 0.000000 2023-10-25 21:07:20,797 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:07:20,797 EPOCH 10 done: loss 0.0087 - lr: 0.000000 2023-10-25 21:07:21,711 DEV : loss 0.1518029123544693 - f1-score (micro avg) 0.7588 2023-10-25 21:07:22,231 ---------------------------------------------------------------------------------------------------- 2023-10-25 21:07:22,233 Loading model from best epoch ... 2023-10-25 21:07:23,959 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-25 21:07:25,507 Results: - F-score (micro) 0.7581 - F-score (macro) 0.6581 - Accuracy 0.6331 By class: precision recall f1-score support PER 0.7855 0.8420 0.8128 348 LOC 0.6943 0.8352 0.7583 261 ORG 0.4348 0.3846 0.4082 52 HumanProd 0.5926 0.7273 0.6531 22 micro avg 0.7197 0.8009 0.7581 683 macro avg 0.6268 0.6973 0.6581 683 weighted avg 0.7177 0.8009 0.7560 683 2023-10-25 21:07:25,507 ----------------------------------------------------------------------------------------------------