2023-10-23 22:21:59,802 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:21:59,803 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(64001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (3): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (4): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (5): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (6): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (7): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (8): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (9): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (10): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (11): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=21, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-23 22:21:59,803 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:21:59,804 MultiCorpus: 3575 train + 1235 dev + 1266 test sentences - NER_HIPE_2022 Corpus: 3575 train + 1235 dev + 1266 test sentences - /home/ubuntu/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/de/with_doc_seperator 2023-10-23 22:21:59,804 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:21:59,804 Train: 3575 sentences 2023-10-23 22:21:59,804 (train_with_dev=False, train_with_test=False) 2023-10-23 22:21:59,804 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:21:59,804 Training Params: 2023-10-23 22:21:59,804 - learning_rate: "3e-05" 2023-10-23 22:21:59,804 - mini_batch_size: "8" 2023-10-23 22:21:59,804 - max_epochs: "10" 2023-10-23 22:21:59,804 - shuffle: "True" 2023-10-23 22:21:59,804 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:21:59,804 Plugins: 2023-10-23 22:21:59,804 - TensorboardLogger 2023-10-23 22:21:59,804 - LinearScheduler | warmup_fraction: '0.1' 2023-10-23 22:21:59,804 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:21:59,804 Final evaluation on model from best epoch (best-model.pt) 2023-10-23 22:21:59,804 - metric: "('micro avg', 'f1-score')" 2023-10-23 22:21:59,804 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:21:59,804 Computation: 2023-10-23 22:21:59,804 - compute on device: cuda:0 2023-10-23 22:21:59,804 - embedding storage: none 2023-10-23 22:21:59,804 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:21:59,804 Model training base path: "hmbench-hipe2020/de-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-4" 2023-10-23 22:21:59,804 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:21:59,804 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:21:59,804 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-23 22:22:03,847 epoch 1 - iter 44/447 - loss 2.92018008 - time (sec): 4.04 - samples/sec: 2032.64 - lr: 0.000003 - momentum: 0.000000 2023-10-23 22:22:07,968 epoch 1 - iter 88/447 - loss 1.92758216 - time (sec): 8.16 - samples/sec: 2089.91 - lr: 0.000006 - momentum: 0.000000 2023-10-23 22:22:11,886 epoch 1 - iter 132/447 - loss 1.46261625 - time (sec): 12.08 - samples/sec: 2086.10 - lr: 0.000009 - momentum: 0.000000 2023-10-23 22:22:15,679 epoch 1 - iter 176/447 - loss 1.22672961 - time (sec): 15.87 - samples/sec: 2103.25 - lr: 0.000012 - momentum: 0.000000 2023-10-23 22:22:19,568 epoch 1 - iter 220/447 - loss 1.04272271 - time (sec): 19.76 - samples/sec: 2125.02 - lr: 0.000015 - momentum: 0.000000 2023-10-23 22:22:23,281 epoch 1 - iter 264/447 - loss 0.91930567 - time (sec): 23.48 - samples/sec: 2138.47 - lr: 0.000018 - momentum: 0.000000 2023-10-23 22:22:27,217 epoch 1 - iter 308/447 - loss 0.82267150 - time (sec): 27.41 - samples/sec: 2146.19 - lr: 0.000021 - momentum: 0.000000 2023-10-23 22:22:31,115 epoch 1 - iter 352/447 - loss 0.74697299 - time (sec): 31.31 - samples/sec: 2150.51 - lr: 0.000024 - momentum: 0.000000 2023-10-23 22:22:34,954 epoch 1 - iter 396/447 - loss 0.69198290 - time (sec): 35.15 - samples/sec: 2151.70 - lr: 0.000027 - momentum: 0.000000 2023-10-23 22:22:39,219 epoch 1 - iter 440/447 - loss 0.63800201 - time (sec): 39.41 - samples/sec: 2156.09 - lr: 0.000029 - momentum: 0.000000 2023-10-23 22:22:39,972 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:22:39,972 EPOCH 1 done: loss 0.6318 - lr: 0.000029 2023-10-23 22:22:44,793 DEV : loss 0.14997614920139313 - f1-score (micro avg) 0.6314 2023-10-23 22:22:44,814 saving best model 2023-10-23 22:22:45,370 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:22:49,085 epoch 2 - iter 44/447 - loss 0.14886525 - time (sec): 3.71 - samples/sec: 2151.01 - lr: 0.000030 - momentum: 0.000000 2023-10-23 22:22:52,913 epoch 2 - iter 88/447 - loss 0.14909506 - time (sec): 7.54 - samples/sec: 2231.06 - lr: 0.000029 - momentum: 0.000000 2023-10-23 22:22:57,328 epoch 2 - iter 132/447 - loss 0.14851561 - time (sec): 11.96 - samples/sec: 2173.51 - lr: 0.000029 - momentum: 0.000000 2023-10-23 22:23:01,211 epoch 2 - iter 176/447 - loss 0.15023382 - time (sec): 15.84 - samples/sec: 2163.41 - lr: 0.000029 - momentum: 0.000000 2023-10-23 22:23:05,281 epoch 2 - iter 220/447 - loss 0.15236516 - time (sec): 19.91 - samples/sec: 2154.22 - lr: 0.000028 - momentum: 0.000000 2023-10-23 22:23:09,340 epoch 2 - iter 264/447 - loss 0.14349368 - time (sec): 23.97 - samples/sec: 2128.94 - lr: 0.000028 - momentum: 0.000000 2023-10-23 22:23:13,120 epoch 2 - iter 308/447 - loss 0.14583098 - time (sec): 27.75 - samples/sec: 2131.92 - lr: 0.000028 - momentum: 0.000000 2023-10-23 22:23:16,865 epoch 2 - iter 352/447 - loss 0.14036361 - time (sec): 31.49 - samples/sec: 2124.32 - lr: 0.000027 - momentum: 0.000000 2023-10-23 22:23:21,191 epoch 2 - iter 396/447 - loss 0.13709503 - time (sec): 35.82 - samples/sec: 2125.42 - lr: 0.000027 - momentum: 0.000000 2023-10-23 22:23:25,203 epoch 2 - iter 440/447 - loss 0.13162957 - time (sec): 39.83 - samples/sec: 2136.40 - lr: 0.000027 - momentum: 0.000000 2023-10-23 22:23:25,803 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:23:25,803 EPOCH 2 done: loss 0.1312 - lr: 0.000027 2023-10-23 22:23:32,296 DEV : loss 0.12065546214580536 - f1-score (micro avg) 0.7139 2023-10-23 22:23:32,316 saving best model 2023-10-23 22:23:33,038 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:23:36,912 epoch 3 - iter 44/447 - loss 0.07762606 - time (sec): 3.87 - samples/sec: 2018.49 - lr: 0.000026 - momentum: 0.000000 2023-10-23 22:23:41,076 epoch 3 - iter 88/447 - loss 0.07919298 - time (sec): 8.04 - samples/sec: 2015.72 - lr: 0.000026 - momentum: 0.000000 2023-10-23 22:23:44,839 epoch 3 - iter 132/447 - loss 0.07541745 - time (sec): 11.80 - samples/sec: 2075.74 - lr: 0.000026 - momentum: 0.000000 2023-10-23 22:23:48,984 epoch 3 - iter 176/447 - loss 0.07757789 - time (sec): 15.95 - samples/sec: 2110.02 - lr: 0.000025 - momentum: 0.000000 2023-10-23 22:23:52,699 epoch 3 - iter 220/447 - loss 0.07526481 - time (sec): 19.66 - samples/sec: 2089.85 - lr: 0.000025 - momentum: 0.000000 2023-10-23 22:23:57,165 epoch 3 - iter 264/447 - loss 0.07614139 - time (sec): 24.13 - samples/sec: 2084.77 - lr: 0.000025 - momentum: 0.000000 2023-10-23 22:24:01,492 epoch 3 - iter 308/447 - loss 0.07545581 - time (sec): 28.45 - samples/sec: 2092.33 - lr: 0.000024 - momentum: 0.000000 2023-10-23 22:24:05,292 epoch 3 - iter 352/447 - loss 0.07211717 - time (sec): 32.25 - samples/sec: 2111.02 - lr: 0.000024 - momentum: 0.000000 2023-10-23 22:24:09,139 epoch 3 - iter 396/447 - loss 0.07279091 - time (sec): 36.10 - samples/sec: 2120.24 - lr: 0.000024 - momentum: 0.000000 2023-10-23 22:24:13,184 epoch 3 - iter 440/447 - loss 0.07369125 - time (sec): 40.15 - samples/sec: 2123.09 - lr: 0.000023 - momentum: 0.000000 2023-10-23 22:24:13,765 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:24:13,765 EPOCH 3 done: loss 0.0737 - lr: 0.000023 2023-10-23 22:24:20,280 DEV : loss 0.12288995832204819 - f1-score (micro avg) 0.7436 2023-10-23 22:24:20,300 saving best model 2023-10-23 22:24:21,019 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:24:24,930 epoch 4 - iter 44/447 - loss 0.04523597 - time (sec): 3.91 - samples/sec: 2145.55 - lr: 0.000023 - momentum: 0.000000 2023-10-23 22:24:28,688 epoch 4 - iter 88/447 - loss 0.04395548 - time (sec): 7.67 - samples/sec: 2133.79 - lr: 0.000023 - momentum: 0.000000 2023-10-23 22:24:32,564 epoch 4 - iter 132/447 - loss 0.04343610 - time (sec): 11.54 - samples/sec: 2165.17 - lr: 0.000022 - momentum: 0.000000 2023-10-23 22:24:36,880 epoch 4 - iter 176/447 - loss 0.04360098 - time (sec): 15.86 - samples/sec: 2160.67 - lr: 0.000022 - momentum: 0.000000 2023-10-23 22:24:41,124 epoch 4 - iter 220/447 - loss 0.04351512 - time (sec): 20.10 - samples/sec: 2135.58 - lr: 0.000022 - momentum: 0.000000 2023-10-23 22:24:44,974 epoch 4 - iter 264/447 - loss 0.04663272 - time (sec): 23.95 - samples/sec: 2137.05 - lr: 0.000021 - momentum: 0.000000 2023-10-23 22:24:48,692 epoch 4 - iter 308/447 - loss 0.04463606 - time (sec): 27.67 - samples/sec: 2145.58 - lr: 0.000021 - momentum: 0.000000 2023-10-23 22:24:52,690 epoch 4 - iter 352/447 - loss 0.04338000 - time (sec): 31.67 - samples/sec: 2142.49 - lr: 0.000021 - momentum: 0.000000 2023-10-23 22:24:56,574 epoch 4 - iter 396/447 - loss 0.04260646 - time (sec): 35.55 - samples/sec: 2139.33 - lr: 0.000020 - momentum: 0.000000 2023-10-23 22:25:00,808 epoch 4 - iter 440/447 - loss 0.04252168 - time (sec): 39.79 - samples/sec: 2139.56 - lr: 0.000020 - momentum: 0.000000 2023-10-23 22:25:01,476 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:25:01,477 EPOCH 4 done: loss 0.0423 - lr: 0.000020 2023-10-23 22:25:07,998 DEV : loss 0.16860494017601013 - f1-score (micro avg) 0.7442 2023-10-23 22:25:08,018 saving best model 2023-10-23 22:25:08,736 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:25:12,979 epoch 5 - iter 44/447 - loss 0.02802474 - time (sec): 4.24 - samples/sec: 2110.69 - lr: 0.000020 - momentum: 0.000000 2023-10-23 22:25:17,051 epoch 5 - iter 88/447 - loss 0.02729669 - time (sec): 8.31 - samples/sec: 2091.32 - lr: 0.000019 - momentum: 0.000000 2023-10-23 22:25:20,797 epoch 5 - iter 132/447 - loss 0.02848820 - time (sec): 12.06 - samples/sec: 2114.88 - lr: 0.000019 - momentum: 0.000000 2023-10-23 22:25:24,924 epoch 5 - iter 176/447 - loss 0.03037370 - time (sec): 16.19 - samples/sec: 2132.25 - lr: 0.000019 - momentum: 0.000000 2023-10-23 22:25:29,219 epoch 5 - iter 220/447 - loss 0.03160364 - time (sec): 20.48 - samples/sec: 2157.95 - lr: 0.000018 - momentum: 0.000000 2023-10-23 22:25:32,933 epoch 5 - iter 264/447 - loss 0.03281760 - time (sec): 24.20 - samples/sec: 2147.02 - lr: 0.000018 - momentum: 0.000000 2023-10-23 22:25:37,105 epoch 5 - iter 308/447 - loss 0.03205699 - time (sec): 28.37 - samples/sec: 2135.24 - lr: 0.000018 - momentum: 0.000000 2023-10-23 22:25:40,844 epoch 5 - iter 352/447 - loss 0.03311176 - time (sec): 32.11 - samples/sec: 2138.65 - lr: 0.000017 - momentum: 0.000000 2023-10-23 22:25:44,732 epoch 5 - iter 396/447 - loss 0.03260248 - time (sec): 36.00 - samples/sec: 2130.28 - lr: 0.000017 - momentum: 0.000000 2023-10-23 22:25:48,605 epoch 5 - iter 440/447 - loss 0.03136514 - time (sec): 39.87 - samples/sec: 2134.27 - lr: 0.000017 - momentum: 0.000000 2023-10-23 22:25:49,291 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:25:49,292 EPOCH 5 done: loss 0.0313 - lr: 0.000017 2023-10-23 22:25:55,808 DEV : loss 0.200847327709198 - f1-score (micro avg) 0.7667 2023-10-23 22:25:55,829 saving best model 2023-10-23 22:25:56,552 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:26:00,120 epoch 6 - iter 44/447 - loss 0.02177514 - time (sec): 3.57 - samples/sec: 2084.94 - lr: 0.000016 - momentum: 0.000000 2023-10-23 22:26:04,046 epoch 6 - iter 88/447 - loss 0.02190717 - time (sec): 7.49 - samples/sec: 2121.33 - lr: 0.000016 - momentum: 0.000000 2023-10-23 22:26:08,182 epoch 6 - iter 132/447 - loss 0.01946378 - time (sec): 11.63 - samples/sec: 2156.61 - lr: 0.000016 - momentum: 0.000000 2023-10-23 22:26:12,249 epoch 6 - iter 176/447 - loss 0.01968006 - time (sec): 15.70 - samples/sec: 2144.33 - lr: 0.000015 - momentum: 0.000000 2023-10-23 22:26:16,657 epoch 6 - iter 220/447 - loss 0.02027486 - time (sec): 20.10 - samples/sec: 2146.14 - lr: 0.000015 - momentum: 0.000000 2023-10-23 22:26:20,363 epoch 6 - iter 264/447 - loss 0.02058604 - time (sec): 23.81 - samples/sec: 2150.25 - lr: 0.000015 - momentum: 0.000000 2023-10-23 22:26:24,533 epoch 6 - iter 308/447 - loss 0.01936116 - time (sec): 27.98 - samples/sec: 2143.75 - lr: 0.000014 - momentum: 0.000000 2023-10-23 22:26:28,696 epoch 6 - iter 352/447 - loss 0.01879624 - time (sec): 32.14 - samples/sec: 2131.02 - lr: 0.000014 - momentum: 0.000000 2023-10-23 22:26:32,687 epoch 6 - iter 396/447 - loss 0.01900730 - time (sec): 36.13 - samples/sec: 2124.58 - lr: 0.000014 - momentum: 0.000000 2023-10-23 22:26:36,558 epoch 6 - iter 440/447 - loss 0.01897263 - time (sec): 40.01 - samples/sec: 2125.86 - lr: 0.000013 - momentum: 0.000000 2023-10-23 22:26:37,262 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:26:37,262 EPOCH 6 done: loss 0.0192 - lr: 0.000013 2023-10-23 22:26:43,765 DEV : loss 0.1983201950788498 - f1-score (micro avg) 0.7736 2023-10-23 22:26:43,785 saving best model 2023-10-23 22:26:44,461 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:26:48,789 epoch 7 - iter 44/447 - loss 0.01146703 - time (sec): 4.33 - samples/sec: 2167.42 - lr: 0.000013 - momentum: 0.000000 2023-10-23 22:26:52,693 epoch 7 - iter 88/447 - loss 0.00983839 - time (sec): 8.23 - samples/sec: 2132.95 - lr: 0.000013 - momentum: 0.000000 2023-10-23 22:26:56,465 epoch 7 - iter 132/447 - loss 0.01147917 - time (sec): 12.00 - samples/sec: 2112.49 - lr: 0.000012 - momentum: 0.000000 2023-10-23 22:27:00,181 epoch 7 - iter 176/447 - loss 0.01213536 - time (sec): 15.72 - samples/sec: 2103.43 - lr: 0.000012 - momentum: 0.000000 2023-10-23 22:27:04,208 epoch 7 - iter 220/447 - loss 0.01145021 - time (sec): 19.75 - samples/sec: 2114.57 - lr: 0.000012 - momentum: 0.000000 2023-10-23 22:27:08,092 epoch 7 - iter 264/447 - loss 0.01226204 - time (sec): 23.63 - samples/sec: 2104.33 - lr: 0.000011 - momentum: 0.000000 2023-10-23 22:27:12,363 epoch 7 - iter 308/447 - loss 0.01179778 - time (sec): 27.90 - samples/sec: 2117.04 - lr: 0.000011 - momentum: 0.000000 2023-10-23 22:27:16,673 epoch 7 - iter 352/447 - loss 0.01203243 - time (sec): 32.21 - samples/sec: 2136.52 - lr: 0.000011 - momentum: 0.000000 2023-10-23 22:27:20,542 epoch 7 - iter 396/447 - loss 0.01222001 - time (sec): 36.08 - samples/sec: 2133.49 - lr: 0.000010 - momentum: 0.000000 2023-10-23 22:27:24,318 epoch 7 - iter 440/447 - loss 0.01181947 - time (sec): 39.86 - samples/sec: 2132.84 - lr: 0.000010 - momentum: 0.000000 2023-10-23 22:27:24,945 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:27:24,945 EPOCH 7 done: loss 0.0120 - lr: 0.000010 2023-10-23 22:27:31,441 DEV : loss 0.21346606314182281 - f1-score (micro avg) 0.7828 2023-10-23 22:27:31,462 saving best model 2023-10-23 22:27:32,110 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:27:36,118 epoch 8 - iter 44/447 - loss 0.00685455 - time (sec): 4.01 - samples/sec: 2125.06 - lr: 0.000010 - momentum: 0.000000 2023-10-23 22:27:40,362 epoch 8 - iter 88/447 - loss 0.01104525 - time (sec): 8.25 - samples/sec: 2070.56 - lr: 0.000009 - momentum: 0.000000 2023-10-23 22:27:44,233 epoch 8 - iter 132/447 - loss 0.00915699 - time (sec): 12.12 - samples/sec: 2113.11 - lr: 0.000009 - momentum: 0.000000 2023-10-23 22:27:48,854 epoch 8 - iter 176/447 - loss 0.00762872 - time (sec): 16.74 - samples/sec: 2106.65 - lr: 0.000009 - momentum: 0.000000 2023-10-23 22:27:52,512 epoch 8 - iter 220/447 - loss 0.00683083 - time (sec): 20.40 - samples/sec: 2110.20 - lr: 0.000008 - momentum: 0.000000 2023-10-23 22:27:56,394 epoch 8 - iter 264/447 - loss 0.00802052 - time (sec): 24.28 - samples/sec: 2126.46 - lr: 0.000008 - momentum: 0.000000 2023-10-23 22:28:00,106 epoch 8 - iter 308/447 - loss 0.00865904 - time (sec): 27.99 - samples/sec: 2135.97 - lr: 0.000008 - momentum: 0.000000 2023-10-23 22:28:03,741 epoch 8 - iter 352/447 - loss 0.00841205 - time (sec): 31.63 - samples/sec: 2128.04 - lr: 0.000007 - momentum: 0.000000 2023-10-23 22:28:07,701 epoch 8 - iter 396/447 - loss 0.00783808 - time (sec): 35.59 - samples/sec: 2131.82 - lr: 0.000007 - momentum: 0.000000 2023-10-23 22:28:12,081 epoch 8 - iter 440/447 - loss 0.00814555 - time (sec): 39.97 - samples/sec: 2130.93 - lr: 0.000007 - momentum: 0.000000 2023-10-23 22:28:12,772 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:28:12,772 EPOCH 8 done: loss 0.0080 - lr: 0.000007 2023-10-23 22:28:19,278 DEV : loss 0.2258313149213791 - f1-score (micro avg) 0.7854 2023-10-23 22:28:19,299 saving best model 2023-10-23 22:28:19,998 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:28:23,671 epoch 9 - iter 44/447 - loss 0.00629803 - time (sec): 3.67 - samples/sec: 2175.80 - lr: 0.000006 - momentum: 0.000000 2023-10-23 22:28:27,528 epoch 9 - iter 88/447 - loss 0.00804701 - time (sec): 7.53 - samples/sec: 2141.78 - lr: 0.000006 - momentum: 0.000000 2023-10-23 22:28:31,356 epoch 9 - iter 132/447 - loss 0.00634854 - time (sec): 11.36 - samples/sec: 2175.71 - lr: 0.000006 - momentum: 0.000000 2023-10-23 22:28:35,001 epoch 9 - iter 176/447 - loss 0.00589061 - time (sec): 15.00 - samples/sec: 2187.76 - lr: 0.000005 - momentum: 0.000000 2023-10-23 22:28:38,990 epoch 9 - iter 220/447 - loss 0.00579974 - time (sec): 18.99 - samples/sec: 2180.18 - lr: 0.000005 - momentum: 0.000000 2023-10-23 22:28:43,354 epoch 9 - iter 264/447 - loss 0.00610104 - time (sec): 23.35 - samples/sec: 2182.46 - lr: 0.000005 - momentum: 0.000000 2023-10-23 22:28:47,454 epoch 9 - iter 308/447 - loss 0.00636396 - time (sec): 27.45 - samples/sec: 2171.15 - lr: 0.000004 - momentum: 0.000000 2023-10-23 22:28:51,842 epoch 9 - iter 352/447 - loss 0.00587760 - time (sec): 31.84 - samples/sec: 2164.71 - lr: 0.000004 - momentum: 0.000000 2023-10-23 22:28:55,760 epoch 9 - iter 396/447 - loss 0.00548106 - time (sec): 35.76 - samples/sec: 2152.27 - lr: 0.000004 - momentum: 0.000000 2023-10-23 22:28:59,716 epoch 9 - iter 440/447 - loss 0.00551406 - time (sec): 39.72 - samples/sec: 2148.37 - lr: 0.000003 - momentum: 0.000000 2023-10-23 22:29:00,334 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:29:00,334 EPOCH 9 done: loss 0.0055 - lr: 0.000003 2023-10-23 22:29:06,821 DEV : loss 0.2376076877117157 - f1-score (micro avg) 0.779 2023-10-23 22:29:06,841 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:29:11,130 epoch 10 - iter 44/447 - loss 0.00018174 - time (sec): 4.29 - samples/sec: 2068.38 - lr: 0.000003 - momentum: 0.000000 2023-10-23 22:29:15,127 epoch 10 - iter 88/447 - loss 0.00104477 - time (sec): 8.29 - samples/sec: 2068.77 - lr: 0.000003 - momentum: 0.000000 2023-10-23 22:29:18,817 epoch 10 - iter 132/447 - loss 0.00073255 - time (sec): 11.98 - samples/sec: 2148.11 - lr: 0.000002 - momentum: 0.000000 2023-10-23 22:29:22,963 epoch 10 - iter 176/447 - loss 0.00142077 - time (sec): 16.12 - samples/sec: 2160.08 - lr: 0.000002 - momentum: 0.000000 2023-10-23 22:29:26,746 epoch 10 - iter 220/447 - loss 0.00213151 - time (sec): 19.90 - samples/sec: 2152.25 - lr: 0.000002 - momentum: 0.000000 2023-10-23 22:29:30,414 epoch 10 - iter 264/447 - loss 0.00292525 - time (sec): 23.57 - samples/sec: 2153.18 - lr: 0.000001 - momentum: 0.000000 2023-10-23 22:29:34,566 epoch 10 - iter 308/447 - loss 0.00317358 - time (sec): 27.72 - samples/sec: 2149.54 - lr: 0.000001 - momentum: 0.000000 2023-10-23 22:29:38,286 epoch 10 - iter 352/447 - loss 0.00305912 - time (sec): 31.44 - samples/sec: 2136.75 - lr: 0.000001 - momentum: 0.000000 2023-10-23 22:29:42,353 epoch 10 - iter 396/447 - loss 0.00332032 - time (sec): 35.51 - samples/sec: 2142.38 - lr: 0.000000 - momentum: 0.000000 2023-10-23 22:29:46,365 epoch 10 - iter 440/447 - loss 0.00332155 - time (sec): 39.52 - samples/sec: 2131.35 - lr: 0.000000 - momentum: 0.000000 2023-10-23 22:29:47,401 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:29:47,401 EPOCH 10 done: loss 0.0033 - lr: 0.000000 2023-10-23 22:29:53,628 DEV : loss 0.23625218868255615 - f1-score (micro avg) 0.7868 2023-10-23 22:29:53,649 saving best model 2023-10-23 22:29:54,860 ---------------------------------------------------------------------------------------------------- 2023-10-23 22:29:54,861 Loading model from best epoch ... 2023-10-23 22:29:56,906 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-prod, B-prod, E-prod, I-prod, S-time, B-time, E-time, I-time 2023-10-23 22:30:01,450 Results: - F-score (micro) 0.7506 - F-score (macro) 0.6633 - Accuracy 0.6181 By class: precision recall f1-score support loc 0.8218 0.8591 0.8400 596 pers 0.6789 0.7808 0.7263 333 org 0.5161 0.4848 0.5000 132 prod 0.6600 0.5000 0.5690 66 time 0.7381 0.6327 0.6813 49 micro avg 0.7365 0.7653 0.7506 1176 macro avg 0.6830 0.6515 0.6633 1176 weighted avg 0.7345 0.7653 0.7478 1176 2023-10-23 22:30:01,450 ----------------------------------------------------------------------------------------------------