stefan-it's picture
Upload ./training.log with huggingface_hub
d81e6ae
2023-10-23 20:34:46,257 ----------------------------------------------------------------------------------------------------
2023-10-23 20:34:46,258 Model: "SequenceTagger(
(embeddings): TransformerWordEmbeddings(
(model): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(64001, 768)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(1): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(2): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(3): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(4): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(5): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(6): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(7): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(8): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(9): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(10): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(11): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=768, out_features=21, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-23 20:34:46,258 ----------------------------------------------------------------------------------------------------
2023-10-23 20:34:46,258 MultiCorpus: 3575 train + 1235 dev + 1266 test sentences
- NER_HIPE_2022 Corpus: 3575 train + 1235 dev + 1266 test sentences - /home/ubuntu/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/de/with_doc_seperator
2023-10-23 20:34:46,258 ----------------------------------------------------------------------------------------------------
2023-10-23 20:34:46,258 Train: 3575 sentences
2023-10-23 20:34:46,258 (train_with_dev=False, train_with_test=False)
2023-10-23 20:34:46,258 ----------------------------------------------------------------------------------------------------
2023-10-23 20:34:46,258 Training Params:
2023-10-23 20:34:46,258 - learning_rate: "5e-05"
2023-10-23 20:34:46,258 - mini_batch_size: "8"
2023-10-23 20:34:46,258 - max_epochs: "10"
2023-10-23 20:34:46,258 - shuffle: "True"
2023-10-23 20:34:46,258 ----------------------------------------------------------------------------------------------------
2023-10-23 20:34:46,258 Plugins:
2023-10-23 20:34:46,258 - TensorboardLogger
2023-10-23 20:34:46,258 - LinearScheduler | warmup_fraction: '0.1'
2023-10-23 20:34:46,258 ----------------------------------------------------------------------------------------------------
2023-10-23 20:34:46,258 Final evaluation on model from best epoch (best-model.pt)
2023-10-23 20:34:46,258 - metric: "('micro avg', 'f1-score')"
2023-10-23 20:34:46,258 ----------------------------------------------------------------------------------------------------
2023-10-23 20:34:46,258 Computation:
2023-10-23 20:34:46,258 - compute on device: cuda:0
2023-10-23 20:34:46,258 - embedding storage: none
2023-10-23 20:34:46,258 ----------------------------------------------------------------------------------------------------
2023-10-23 20:34:46,259 Model training base path: "hmbench-hipe2020/de-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-1"
2023-10-23 20:34:46,259 ----------------------------------------------------------------------------------------------------
2023-10-23 20:34:46,259 ----------------------------------------------------------------------------------------------------
2023-10-23 20:34:46,259 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-23 20:34:50,214 epoch 1 - iter 44/447 - loss 3.08591702 - time (sec): 3.95 - samples/sec: 2151.57 - lr: 0.000005 - momentum: 0.000000
2023-10-23 20:34:53,927 epoch 1 - iter 88/447 - loss 1.92199617 - time (sec): 7.67 - samples/sec: 2132.97 - lr: 0.000010 - momentum: 0.000000
2023-10-23 20:34:57,836 epoch 1 - iter 132/447 - loss 1.39896841 - time (sec): 11.58 - samples/sec: 2162.64 - lr: 0.000015 - momentum: 0.000000
2023-10-23 20:35:01,868 epoch 1 - iter 176/447 - loss 1.12827946 - time (sec): 15.61 - samples/sec: 2127.79 - lr: 0.000020 - momentum: 0.000000
2023-10-23 20:35:05,662 epoch 1 - iter 220/447 - loss 0.96436501 - time (sec): 19.40 - samples/sec: 2147.43 - lr: 0.000024 - momentum: 0.000000
2023-10-23 20:35:09,538 epoch 1 - iter 264/447 - loss 0.84161150 - time (sec): 23.28 - samples/sec: 2137.60 - lr: 0.000029 - momentum: 0.000000
2023-10-23 20:35:13,581 epoch 1 - iter 308/447 - loss 0.75094933 - time (sec): 27.32 - samples/sec: 2132.17 - lr: 0.000034 - momentum: 0.000000
2023-10-23 20:35:18,004 epoch 1 - iter 352/447 - loss 0.67600080 - time (sec): 31.74 - samples/sec: 2139.46 - lr: 0.000039 - momentum: 0.000000
2023-10-23 20:35:22,004 epoch 1 - iter 396/447 - loss 0.62004085 - time (sec): 35.74 - samples/sec: 2148.08 - lr: 0.000044 - momentum: 0.000000
2023-10-23 20:35:26,004 epoch 1 - iter 440/447 - loss 0.58216934 - time (sec): 39.74 - samples/sec: 2148.19 - lr: 0.000049 - momentum: 0.000000
2023-10-23 20:35:26,580 ----------------------------------------------------------------------------------------------------
2023-10-23 20:35:26,580 EPOCH 1 done: loss 0.5760 - lr: 0.000049
2023-10-23 20:35:31,379 DEV : loss 0.1480439156293869 - f1-score (micro avg) 0.663
2023-10-23 20:35:31,399 saving best model
2023-10-23 20:35:31,947 ----------------------------------------------------------------------------------------------------
2023-10-23 20:35:36,424 epoch 2 - iter 44/447 - loss 0.15891281 - time (sec): 4.48 - samples/sec: 2130.44 - lr: 0.000049 - momentum: 0.000000
2023-10-23 20:35:40,292 epoch 2 - iter 88/447 - loss 0.14907678 - time (sec): 8.34 - samples/sec: 2134.91 - lr: 0.000049 - momentum: 0.000000
2023-10-23 20:35:44,386 epoch 2 - iter 132/447 - loss 0.14353574 - time (sec): 12.44 - samples/sec: 2100.19 - lr: 0.000048 - momentum: 0.000000
2023-10-23 20:35:48,177 epoch 2 - iter 176/447 - loss 0.14493075 - time (sec): 16.23 - samples/sec: 2121.25 - lr: 0.000048 - momentum: 0.000000
2023-10-23 20:35:52,020 epoch 2 - iter 220/447 - loss 0.14099654 - time (sec): 20.07 - samples/sec: 2112.92 - lr: 0.000047 - momentum: 0.000000
2023-10-23 20:35:56,054 epoch 2 - iter 264/447 - loss 0.14001060 - time (sec): 24.11 - samples/sec: 2122.33 - lr: 0.000047 - momentum: 0.000000
2023-10-23 20:36:00,154 epoch 2 - iter 308/447 - loss 0.13478504 - time (sec): 28.21 - samples/sec: 2129.26 - lr: 0.000046 - momentum: 0.000000
2023-10-23 20:36:04,356 epoch 2 - iter 352/447 - loss 0.13430691 - time (sec): 32.41 - samples/sec: 2123.93 - lr: 0.000046 - momentum: 0.000000
2023-10-23 20:36:08,229 epoch 2 - iter 396/447 - loss 0.13148998 - time (sec): 36.28 - samples/sec: 2118.19 - lr: 0.000045 - momentum: 0.000000
2023-10-23 20:36:12,037 epoch 2 - iter 440/447 - loss 0.12964162 - time (sec): 40.09 - samples/sec: 2124.43 - lr: 0.000045 - momentum: 0.000000
2023-10-23 20:36:12,661 ----------------------------------------------------------------------------------------------------
2023-10-23 20:36:12,661 EPOCH 2 done: loss 0.1306 - lr: 0.000045
2023-10-23 20:36:19,168 DEV : loss 0.14241820573806763 - f1-score (micro avg) 0.6282
2023-10-23 20:36:19,189 ----------------------------------------------------------------------------------------------------
2023-10-23 20:36:23,034 epoch 3 - iter 44/447 - loss 0.06032349 - time (sec): 3.84 - samples/sec: 2092.70 - lr: 0.000044 - momentum: 0.000000
2023-10-23 20:36:27,150 epoch 3 - iter 88/447 - loss 0.06341120 - time (sec): 7.96 - samples/sec: 2146.02 - lr: 0.000043 - momentum: 0.000000
2023-10-23 20:36:31,170 epoch 3 - iter 132/447 - loss 0.06583153 - time (sec): 11.98 - samples/sec: 2102.85 - lr: 0.000043 - momentum: 0.000000
2023-10-23 20:36:35,055 epoch 3 - iter 176/447 - loss 0.07127918 - time (sec): 15.87 - samples/sec: 2120.46 - lr: 0.000042 - momentum: 0.000000
2023-10-23 20:36:38,731 epoch 3 - iter 220/447 - loss 0.07341253 - time (sec): 19.54 - samples/sec: 2113.76 - lr: 0.000042 - momentum: 0.000000
2023-10-23 20:36:42,651 epoch 3 - iter 264/447 - loss 0.07021264 - time (sec): 23.46 - samples/sec: 2133.69 - lr: 0.000041 - momentum: 0.000000
2023-10-23 20:36:46,615 epoch 3 - iter 308/447 - loss 0.07048521 - time (sec): 27.43 - samples/sec: 2135.27 - lr: 0.000041 - momentum: 0.000000
2023-10-23 20:36:50,424 epoch 3 - iter 352/447 - loss 0.07189812 - time (sec): 31.23 - samples/sec: 2132.33 - lr: 0.000040 - momentum: 0.000000
2023-10-23 20:36:54,663 epoch 3 - iter 396/447 - loss 0.07322618 - time (sec): 35.47 - samples/sec: 2126.10 - lr: 0.000040 - momentum: 0.000000
2023-10-23 20:36:58,537 epoch 3 - iter 440/447 - loss 0.07165755 - time (sec): 39.35 - samples/sec: 2138.37 - lr: 0.000039 - momentum: 0.000000
2023-10-23 20:36:59,519 ----------------------------------------------------------------------------------------------------
2023-10-23 20:36:59,520 EPOCH 3 done: loss 0.0713 - lr: 0.000039
2023-10-23 20:37:06,006 DEV : loss 0.1675412803888321 - f1-score (micro avg) 0.7561
2023-10-23 20:37:06,026 saving best model
2023-10-23 20:37:06,800 ----------------------------------------------------------------------------------------------------
2023-10-23 20:37:11,011 epoch 4 - iter 44/447 - loss 0.05463008 - time (sec): 4.21 - samples/sec: 2139.44 - lr: 0.000038 - momentum: 0.000000
2023-10-23 20:37:14,807 epoch 4 - iter 88/447 - loss 0.04619879 - time (sec): 8.01 - samples/sec: 2158.54 - lr: 0.000038 - momentum: 0.000000
2023-10-23 20:37:18,672 epoch 4 - iter 132/447 - loss 0.04878841 - time (sec): 11.87 - samples/sec: 2152.81 - lr: 0.000037 - momentum: 0.000000
2023-10-23 20:37:22,367 epoch 4 - iter 176/447 - loss 0.04429914 - time (sec): 15.57 - samples/sec: 2154.49 - lr: 0.000037 - momentum: 0.000000
2023-10-23 20:37:26,630 epoch 4 - iter 220/447 - loss 0.04506808 - time (sec): 19.83 - samples/sec: 2147.70 - lr: 0.000036 - momentum: 0.000000
2023-10-23 20:37:30,420 epoch 4 - iter 264/447 - loss 0.04554775 - time (sec): 23.62 - samples/sec: 2128.75 - lr: 0.000036 - momentum: 0.000000
2023-10-23 20:37:34,240 epoch 4 - iter 308/447 - loss 0.04617356 - time (sec): 27.44 - samples/sec: 2133.27 - lr: 0.000035 - momentum: 0.000000
2023-10-23 20:37:38,186 epoch 4 - iter 352/447 - loss 0.04610295 - time (sec): 31.39 - samples/sec: 2128.76 - lr: 0.000035 - momentum: 0.000000
2023-10-23 20:37:42,725 epoch 4 - iter 396/447 - loss 0.04655391 - time (sec): 35.92 - samples/sec: 2126.27 - lr: 0.000034 - momentum: 0.000000
2023-10-23 20:37:46,676 epoch 4 - iter 440/447 - loss 0.04692162 - time (sec): 39.87 - samples/sec: 2134.89 - lr: 0.000033 - momentum: 0.000000
2023-10-23 20:37:47,359 ----------------------------------------------------------------------------------------------------
2023-10-23 20:37:47,359 EPOCH 4 done: loss 0.0469 - lr: 0.000033
2023-10-23 20:37:53,844 DEV : loss 0.1665799915790558 - f1-score (micro avg) 0.74
2023-10-23 20:37:53,865 ----------------------------------------------------------------------------------------------------
2023-10-23 20:37:57,539 epoch 5 - iter 44/447 - loss 0.04951173 - time (sec): 3.67 - samples/sec: 2080.67 - lr: 0.000033 - momentum: 0.000000
2023-10-23 20:38:01,398 epoch 5 - iter 88/447 - loss 0.04294255 - time (sec): 7.53 - samples/sec: 2096.59 - lr: 0.000032 - momentum: 0.000000
2023-10-23 20:38:05,711 epoch 5 - iter 132/447 - loss 0.04150257 - time (sec): 11.85 - samples/sec: 2092.61 - lr: 0.000032 - momentum: 0.000000
2023-10-23 20:38:09,518 epoch 5 - iter 176/447 - loss 0.03750633 - time (sec): 15.65 - samples/sec: 2115.03 - lr: 0.000031 - momentum: 0.000000
2023-10-23 20:38:13,240 epoch 5 - iter 220/447 - loss 0.03634709 - time (sec): 19.37 - samples/sec: 2126.46 - lr: 0.000031 - momentum: 0.000000
2023-10-23 20:38:17,675 epoch 5 - iter 264/447 - loss 0.03702155 - time (sec): 23.81 - samples/sec: 2132.75 - lr: 0.000030 - momentum: 0.000000
2023-10-23 20:38:21,375 epoch 5 - iter 308/447 - loss 0.03726439 - time (sec): 27.51 - samples/sec: 2144.51 - lr: 0.000030 - momentum: 0.000000
2023-10-23 20:38:25,414 epoch 5 - iter 352/447 - loss 0.03644814 - time (sec): 31.55 - samples/sec: 2155.96 - lr: 0.000029 - momentum: 0.000000
2023-10-23 20:38:29,193 epoch 5 - iter 396/447 - loss 0.03584344 - time (sec): 35.33 - samples/sec: 2146.52 - lr: 0.000028 - momentum: 0.000000
2023-10-23 20:38:33,763 epoch 5 - iter 440/447 - loss 0.03535728 - time (sec): 39.90 - samples/sec: 2136.98 - lr: 0.000028 - momentum: 0.000000
2023-10-23 20:38:34,376 ----------------------------------------------------------------------------------------------------
2023-10-23 20:38:34,376 EPOCH 5 done: loss 0.0356 - lr: 0.000028
2023-10-23 20:38:40,847 DEV : loss 0.2225048989057541 - f1-score (micro avg) 0.7541
2023-10-23 20:38:40,867 ----------------------------------------------------------------------------------------------------
2023-10-23 20:38:44,634 epoch 6 - iter 44/447 - loss 0.02861626 - time (sec): 3.77 - samples/sec: 2196.59 - lr: 0.000027 - momentum: 0.000000
2023-10-23 20:38:48,495 epoch 6 - iter 88/447 - loss 0.03090318 - time (sec): 7.63 - samples/sec: 2182.60 - lr: 0.000027 - momentum: 0.000000
2023-10-23 20:38:53,070 epoch 6 - iter 132/447 - loss 0.03202164 - time (sec): 12.20 - samples/sec: 2163.53 - lr: 0.000026 - momentum: 0.000000
2023-10-23 20:38:57,419 epoch 6 - iter 176/447 - loss 0.03010408 - time (sec): 16.55 - samples/sec: 2132.30 - lr: 0.000026 - momentum: 0.000000
2023-10-23 20:39:01,536 epoch 6 - iter 220/447 - loss 0.02822882 - time (sec): 20.67 - samples/sec: 2131.41 - lr: 0.000025 - momentum: 0.000000
2023-10-23 20:39:05,582 epoch 6 - iter 264/447 - loss 0.02593149 - time (sec): 24.71 - samples/sec: 2134.92 - lr: 0.000025 - momentum: 0.000000
2023-10-23 20:39:09,205 epoch 6 - iter 308/447 - loss 0.02519418 - time (sec): 28.34 - samples/sec: 2124.42 - lr: 0.000024 - momentum: 0.000000
2023-10-23 20:39:12,950 epoch 6 - iter 352/447 - loss 0.02614899 - time (sec): 32.08 - samples/sec: 2131.97 - lr: 0.000023 - momentum: 0.000000
2023-10-23 20:39:16,887 epoch 6 - iter 396/447 - loss 0.02612555 - time (sec): 36.02 - samples/sec: 2129.44 - lr: 0.000023 - momentum: 0.000000
2023-10-23 20:39:20,812 epoch 6 - iter 440/447 - loss 0.02515450 - time (sec): 39.94 - samples/sec: 2136.58 - lr: 0.000022 - momentum: 0.000000
2023-10-23 20:39:21,436 ----------------------------------------------------------------------------------------------------
2023-10-23 20:39:21,436 EPOCH 6 done: loss 0.0252 - lr: 0.000022
2023-10-23 20:39:27,918 DEV : loss 0.24188880622386932 - f1-score (micro avg) 0.7539
2023-10-23 20:39:27,939 ----------------------------------------------------------------------------------------------------
2023-10-23 20:39:32,281 epoch 7 - iter 44/447 - loss 0.03582544 - time (sec): 4.34 - samples/sec: 2165.08 - lr: 0.000022 - momentum: 0.000000
2023-10-23 20:39:36,428 epoch 7 - iter 88/447 - loss 0.02343980 - time (sec): 8.49 - samples/sec: 2104.12 - lr: 0.000021 - momentum: 0.000000
2023-10-23 20:39:40,637 epoch 7 - iter 132/447 - loss 0.02247574 - time (sec): 12.70 - samples/sec: 2128.59 - lr: 0.000021 - momentum: 0.000000
2023-10-23 20:39:44,459 epoch 7 - iter 176/447 - loss 0.02484634 - time (sec): 16.52 - samples/sec: 2120.83 - lr: 0.000020 - momentum: 0.000000
2023-10-23 20:39:48,293 epoch 7 - iter 220/447 - loss 0.02289894 - time (sec): 20.35 - samples/sec: 2111.43 - lr: 0.000020 - momentum: 0.000000
2023-10-23 20:39:52,124 epoch 7 - iter 264/447 - loss 0.02072518 - time (sec): 24.18 - samples/sec: 2123.10 - lr: 0.000019 - momentum: 0.000000
2023-10-23 20:39:56,375 epoch 7 - iter 308/447 - loss 0.02007036 - time (sec): 28.44 - samples/sec: 2125.41 - lr: 0.000018 - momentum: 0.000000
2023-10-23 20:40:00,110 epoch 7 - iter 352/447 - loss 0.01916875 - time (sec): 32.17 - samples/sec: 2143.94 - lr: 0.000018 - momentum: 0.000000
2023-10-23 20:40:04,092 epoch 7 - iter 396/447 - loss 0.01862712 - time (sec): 36.15 - samples/sec: 2127.24 - lr: 0.000017 - momentum: 0.000000
2023-10-23 20:40:07,948 epoch 7 - iter 440/447 - loss 0.01780365 - time (sec): 40.01 - samples/sec: 2138.09 - lr: 0.000017 - momentum: 0.000000
2023-10-23 20:40:08,469 ----------------------------------------------------------------------------------------------------
2023-10-23 20:40:08,469 EPOCH 7 done: loss 0.0178 - lr: 0.000017
2023-10-23 20:40:14,944 DEV : loss 0.26893866062164307 - f1-score (micro avg) 0.7761
2023-10-23 20:40:14,964 saving best model
2023-10-23 20:40:15,678 ----------------------------------------------------------------------------------------------------
2023-10-23 20:40:19,533 epoch 8 - iter 44/447 - loss 0.00921673 - time (sec): 3.85 - samples/sec: 2158.83 - lr: 0.000016 - momentum: 0.000000
2023-10-23 20:40:23,365 epoch 8 - iter 88/447 - loss 0.01119385 - time (sec): 7.69 - samples/sec: 2183.92 - lr: 0.000016 - momentum: 0.000000
2023-10-23 20:40:27,948 epoch 8 - iter 132/447 - loss 0.01095721 - time (sec): 12.27 - samples/sec: 2125.10 - lr: 0.000015 - momentum: 0.000000
2023-10-23 20:40:31,658 epoch 8 - iter 176/447 - loss 0.01028475 - time (sec): 15.98 - samples/sec: 2150.30 - lr: 0.000015 - momentum: 0.000000
2023-10-23 20:40:35,750 epoch 8 - iter 220/447 - loss 0.00902436 - time (sec): 20.07 - samples/sec: 2145.08 - lr: 0.000014 - momentum: 0.000000
2023-10-23 20:40:39,401 epoch 8 - iter 264/447 - loss 0.00849857 - time (sec): 23.72 - samples/sec: 2135.50 - lr: 0.000013 - momentum: 0.000000
2023-10-23 20:40:43,377 epoch 8 - iter 308/447 - loss 0.00962661 - time (sec): 27.70 - samples/sec: 2130.49 - lr: 0.000013 - momentum: 0.000000
2023-10-23 20:40:47,327 epoch 8 - iter 352/447 - loss 0.01052978 - time (sec): 31.65 - samples/sec: 2137.57 - lr: 0.000012 - momentum: 0.000000
2023-10-23 20:40:51,295 epoch 8 - iter 396/447 - loss 0.01131149 - time (sec): 35.62 - samples/sec: 2139.29 - lr: 0.000012 - momentum: 0.000000
2023-10-23 20:40:55,580 epoch 8 - iter 440/447 - loss 0.01174824 - time (sec): 39.90 - samples/sec: 2136.08 - lr: 0.000011 - momentum: 0.000000
2023-10-23 20:40:56,194 ----------------------------------------------------------------------------------------------------
2023-10-23 20:40:56,194 EPOCH 8 done: loss 0.0120 - lr: 0.000011
2023-10-23 20:41:02,415 DEV : loss 0.2459404468536377 - f1-score (micro avg) 0.7574
2023-10-23 20:41:02,436 ----------------------------------------------------------------------------------------------------
2023-10-23 20:41:06,866 epoch 9 - iter 44/447 - loss 0.03433726 - time (sec): 4.43 - samples/sec: 2016.40 - lr: 0.000011 - momentum: 0.000000
2023-10-23 20:41:11,137 epoch 9 - iter 88/447 - loss 0.03002005 - time (sec): 8.70 - samples/sec: 2071.75 - lr: 0.000010 - momentum: 0.000000
2023-10-23 20:41:15,202 epoch 9 - iter 132/447 - loss 0.02907463 - time (sec): 12.77 - samples/sec: 2064.68 - lr: 0.000010 - momentum: 0.000000
2023-10-23 20:41:18,874 epoch 9 - iter 176/447 - loss 0.02816834 - time (sec): 16.44 - samples/sec: 2056.26 - lr: 0.000009 - momentum: 0.000000
2023-10-23 20:41:22,601 epoch 9 - iter 220/447 - loss 0.02425467 - time (sec): 20.16 - samples/sec: 2068.35 - lr: 0.000008 - momentum: 0.000000
2023-10-23 20:41:26,685 epoch 9 - iter 264/447 - loss 0.02386885 - time (sec): 24.25 - samples/sec: 2078.56 - lr: 0.000008 - momentum: 0.000000
2023-10-23 20:41:30,383 epoch 9 - iter 308/447 - loss 0.02183922 - time (sec): 27.95 - samples/sec: 2096.13 - lr: 0.000007 - momentum: 0.000000
2023-10-23 20:41:34,996 epoch 9 - iter 352/447 - loss 0.02186201 - time (sec): 32.56 - samples/sec: 2127.54 - lr: 0.000007 - momentum: 0.000000
2023-10-23 20:41:38,749 epoch 9 - iter 396/447 - loss 0.02183062 - time (sec): 36.31 - samples/sec: 2134.87 - lr: 0.000006 - momentum: 0.000000
2023-10-23 20:41:42,489 epoch 9 - iter 440/447 - loss 0.02414038 - time (sec): 40.05 - samples/sec: 2131.89 - lr: 0.000006 - momentum: 0.000000
2023-10-23 20:41:43,068 ----------------------------------------------------------------------------------------------------
2023-10-23 20:41:43,068 EPOCH 9 done: loss 0.0241 - lr: 0.000006
2023-10-23 20:41:49,270 DEV : loss 0.277804434299469 - f1-score (micro avg) 0.6578
2023-10-23 20:41:49,290 ----------------------------------------------------------------------------------------------------
2023-10-23 20:41:53,441 epoch 10 - iter 44/447 - loss 0.07121556 - time (sec): 4.15 - samples/sec: 2054.74 - lr: 0.000005 - momentum: 0.000000
2023-10-23 20:41:57,288 epoch 10 - iter 88/447 - loss 0.06814730 - time (sec): 8.00 - samples/sec: 2138.55 - lr: 0.000005 - momentum: 0.000000
2023-10-23 20:42:01,098 epoch 10 - iter 132/447 - loss 0.06359458 - time (sec): 11.81 - samples/sec: 2134.13 - lr: 0.000004 - momentum: 0.000000
2023-10-23 20:42:05,202 epoch 10 - iter 176/447 - loss 0.06000699 - time (sec): 15.91 - samples/sec: 2094.21 - lr: 0.000003 - momentum: 0.000000
2023-10-23 20:42:09,004 epoch 10 - iter 220/447 - loss 0.05728519 - time (sec): 19.71 - samples/sec: 2098.42 - lr: 0.000003 - momentum: 0.000000
2023-10-23 20:42:12,907 epoch 10 - iter 264/447 - loss 0.05401101 - time (sec): 23.62 - samples/sec: 2103.75 - lr: 0.000002 - momentum: 0.000000
2023-10-23 20:42:16,768 epoch 10 - iter 308/447 - loss 0.05440301 - time (sec): 27.48 - samples/sec: 2099.70 - lr: 0.000002 - momentum: 0.000000
2023-10-23 20:42:20,489 epoch 10 - iter 352/447 - loss 0.05284119 - time (sec): 31.20 - samples/sec: 2115.34 - lr: 0.000001 - momentum: 0.000000
2023-10-23 20:42:25,162 epoch 10 - iter 396/447 - loss 0.05118931 - time (sec): 35.87 - samples/sec: 2132.18 - lr: 0.000001 - momentum: 0.000000
2023-10-23 20:42:28,992 epoch 10 - iter 440/447 - loss 0.05044650 - time (sec): 39.70 - samples/sec: 2128.66 - lr: 0.000000 - momentum: 0.000000
2023-10-23 20:42:29,908 ----------------------------------------------------------------------------------------------------
2023-10-23 20:42:29,908 EPOCH 10 done: loss 0.0503 - lr: 0.000000
2023-10-23 20:42:36,109 DEV : loss 0.2561439871788025 - f1-score (micro avg) 0.6432
2023-10-23 20:42:36,679 ----------------------------------------------------------------------------------------------------
2023-10-23 20:42:36,680 Loading model from best epoch ...
2023-10-23 20:42:38,467 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-prod, B-prod, E-prod, I-prod, S-time, B-time, E-time, I-time
2023-10-23 20:42:43,278
Results:
- F-score (micro) 0.7458
- F-score (macro) 0.6623
- Accuracy 0.6117
By class:
precision recall f1-score support
loc 0.8300 0.8356 0.8328 596
pers 0.6711 0.7598 0.7127 333
org 0.5688 0.4697 0.5145 132
prod 0.7297 0.4091 0.5243 66
time 0.7200 0.7347 0.7273 49
micro avg 0.7468 0.7449 0.7458 1176
macro avg 0.7039 0.6418 0.6623 1176
weighted avg 0.7455 0.7449 0.7413 1176
2023-10-23 20:42:43,279 ----------------------------------------------------------------------------------------------------