stefan-it's picture
Upload ./training.log with huggingface_hub
4e59e02
2023-10-23 22:38:32,825 ----------------------------------------------------------------------------------------------------
2023-10-23 22:38:32,826 Model: "SequenceTagger(
(embeddings): TransformerWordEmbeddings(
(model): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(64001, 768)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(1): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(2): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(3): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(4): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(5): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(6): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(7): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(8): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(9): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(10): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(11): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=768, out_features=21, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-23 22:38:32,826 ----------------------------------------------------------------------------------------------------
2023-10-23 22:38:32,827 MultiCorpus: 3575 train + 1235 dev + 1266 test sentences
- NER_HIPE_2022 Corpus: 3575 train + 1235 dev + 1266 test sentences - /home/ubuntu/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/de/with_doc_seperator
2023-10-23 22:38:32,827 ----------------------------------------------------------------------------------------------------
2023-10-23 22:38:32,827 Train: 3575 sentences
2023-10-23 22:38:32,827 (train_with_dev=False, train_with_test=False)
2023-10-23 22:38:32,827 ----------------------------------------------------------------------------------------------------
2023-10-23 22:38:32,827 Training Params:
2023-10-23 22:38:32,827 - learning_rate: "3e-05"
2023-10-23 22:38:32,827 - mini_batch_size: "4"
2023-10-23 22:38:32,827 - max_epochs: "10"
2023-10-23 22:38:32,827 - shuffle: "True"
2023-10-23 22:38:32,827 ----------------------------------------------------------------------------------------------------
2023-10-23 22:38:32,827 Plugins:
2023-10-23 22:38:32,827 - TensorboardLogger
2023-10-23 22:38:32,827 - LinearScheduler | warmup_fraction: '0.1'
2023-10-23 22:38:32,827 ----------------------------------------------------------------------------------------------------
2023-10-23 22:38:32,827 Final evaluation on model from best epoch (best-model.pt)
2023-10-23 22:38:32,827 - metric: "('micro avg', 'f1-score')"
2023-10-23 22:38:32,827 ----------------------------------------------------------------------------------------------------
2023-10-23 22:38:32,827 Computation:
2023-10-23 22:38:32,827 - compute on device: cuda:0
2023-10-23 22:38:32,827 - embedding storage: none
2023-10-23 22:38:32,827 ----------------------------------------------------------------------------------------------------
2023-10-23 22:38:32,827 Model training base path: "hmbench-hipe2020/de-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs4-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-5"
2023-10-23 22:38:32,827 ----------------------------------------------------------------------------------------------------
2023-10-23 22:38:32,827 ----------------------------------------------------------------------------------------------------
2023-10-23 22:38:32,827 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-23 22:38:38,597 epoch 1 - iter 89/894 - loss 2.19658834 - time (sec): 5.77 - samples/sec: 1565.10 - lr: 0.000003 - momentum: 0.000000
2023-10-23 22:38:44,180 epoch 1 - iter 178/894 - loss 1.42684353 - time (sec): 11.35 - samples/sec: 1545.34 - lr: 0.000006 - momentum: 0.000000
2023-10-23 22:38:49,819 epoch 1 - iter 267/894 - loss 1.10437214 - time (sec): 16.99 - samples/sec: 1550.56 - lr: 0.000009 - momentum: 0.000000
2023-10-23 22:38:55,340 epoch 1 - iter 356/894 - loss 0.92546222 - time (sec): 22.51 - samples/sec: 1547.85 - lr: 0.000012 - momentum: 0.000000
2023-10-23 22:39:01,285 epoch 1 - iter 445/894 - loss 0.79270530 - time (sec): 28.46 - samples/sec: 1551.58 - lr: 0.000015 - momentum: 0.000000
2023-10-23 22:39:06,846 epoch 1 - iter 534/894 - loss 0.70887847 - time (sec): 34.02 - samples/sec: 1537.26 - lr: 0.000018 - momentum: 0.000000
2023-10-23 22:39:12,449 epoch 1 - iter 623/894 - loss 0.64668937 - time (sec): 39.62 - samples/sec: 1525.30 - lr: 0.000021 - momentum: 0.000000
2023-10-23 22:39:17,955 epoch 1 - iter 712/894 - loss 0.59748589 - time (sec): 45.13 - samples/sec: 1509.33 - lr: 0.000024 - momentum: 0.000000
2023-10-23 22:39:23,729 epoch 1 - iter 801/894 - loss 0.55098950 - time (sec): 50.90 - samples/sec: 1523.94 - lr: 0.000027 - momentum: 0.000000
2023-10-23 22:39:29,420 epoch 1 - iter 890/894 - loss 0.51190763 - time (sec): 56.59 - samples/sec: 1524.69 - lr: 0.000030 - momentum: 0.000000
2023-10-23 22:39:29,659 ----------------------------------------------------------------------------------------------------
2023-10-23 22:39:29,659 EPOCH 1 done: loss 0.5116 - lr: 0.000030
2023-10-23 22:39:34,496 DEV : loss 0.17914246022701263 - f1-score (micro avg) 0.6504
2023-10-23 22:39:34,517 saving best model
2023-10-23 22:39:34,994 ----------------------------------------------------------------------------------------------------
2023-10-23 22:39:40,776 epoch 2 - iter 89/894 - loss 0.14849414 - time (sec): 5.78 - samples/sec: 1484.55 - lr: 0.000030 - momentum: 0.000000
2023-10-23 22:39:46,513 epoch 2 - iter 178/894 - loss 0.16030660 - time (sec): 11.52 - samples/sec: 1523.25 - lr: 0.000029 - momentum: 0.000000
2023-10-23 22:39:52,146 epoch 2 - iter 267/894 - loss 0.15694845 - time (sec): 17.15 - samples/sec: 1506.24 - lr: 0.000029 - momentum: 0.000000
2023-10-23 22:39:57,974 epoch 2 - iter 356/894 - loss 0.15586715 - time (sec): 22.98 - samples/sec: 1519.90 - lr: 0.000029 - momentum: 0.000000
2023-10-23 22:40:03,546 epoch 2 - iter 445/894 - loss 0.15381150 - time (sec): 28.55 - samples/sec: 1516.02 - lr: 0.000028 - momentum: 0.000000
2023-10-23 22:40:09,151 epoch 2 - iter 534/894 - loss 0.15159799 - time (sec): 34.16 - samples/sec: 1516.82 - lr: 0.000028 - momentum: 0.000000
2023-10-23 22:40:14,705 epoch 2 - iter 623/894 - loss 0.14575727 - time (sec): 39.71 - samples/sec: 1514.08 - lr: 0.000028 - momentum: 0.000000
2023-10-23 22:40:20,441 epoch 2 - iter 712/894 - loss 0.14248398 - time (sec): 45.45 - samples/sec: 1525.30 - lr: 0.000027 - momentum: 0.000000
2023-10-23 22:40:26,037 epoch 2 - iter 801/894 - loss 0.14143481 - time (sec): 51.04 - samples/sec: 1519.62 - lr: 0.000027 - momentum: 0.000000
2023-10-23 22:40:31,697 epoch 2 - iter 890/894 - loss 0.13933960 - time (sec): 56.70 - samples/sec: 1520.47 - lr: 0.000027 - momentum: 0.000000
2023-10-23 22:40:31,937 ----------------------------------------------------------------------------------------------------
2023-10-23 22:40:31,938 EPOCH 2 done: loss 0.1397 - lr: 0.000027
2023-10-23 22:40:38,442 DEV : loss 0.161593958735466 - f1-score (micro avg) 0.6957
2023-10-23 22:40:38,463 saving best model
2023-10-23 22:40:39,056 ----------------------------------------------------------------------------------------------------
2023-10-23 22:40:44,992 epoch 3 - iter 89/894 - loss 0.09507660 - time (sec): 5.93 - samples/sec: 1626.66 - lr: 0.000026 - momentum: 0.000000
2023-10-23 22:40:50,634 epoch 3 - iter 178/894 - loss 0.08959201 - time (sec): 11.58 - samples/sec: 1616.43 - lr: 0.000026 - momentum: 0.000000
2023-10-23 22:40:56,328 epoch 3 - iter 267/894 - loss 0.08944180 - time (sec): 17.27 - samples/sec: 1594.58 - lr: 0.000026 - momentum: 0.000000
2023-10-23 22:41:01,809 epoch 3 - iter 356/894 - loss 0.08534777 - time (sec): 22.75 - samples/sec: 1562.15 - lr: 0.000025 - momentum: 0.000000
2023-10-23 22:41:07,752 epoch 3 - iter 445/894 - loss 0.08007011 - time (sec): 28.69 - samples/sec: 1566.08 - lr: 0.000025 - momentum: 0.000000
2023-10-23 22:41:13,689 epoch 3 - iter 534/894 - loss 0.07925379 - time (sec): 34.63 - samples/sec: 1570.82 - lr: 0.000025 - momentum: 0.000000
2023-10-23 22:41:19,195 epoch 3 - iter 623/894 - loss 0.08066880 - time (sec): 40.14 - samples/sec: 1549.12 - lr: 0.000024 - momentum: 0.000000
2023-10-23 22:41:24,702 epoch 3 - iter 712/894 - loss 0.08288875 - time (sec): 45.64 - samples/sec: 1530.74 - lr: 0.000024 - momentum: 0.000000
2023-10-23 22:41:30,366 epoch 3 - iter 801/894 - loss 0.08332226 - time (sec): 51.31 - samples/sec: 1533.41 - lr: 0.000024 - momentum: 0.000000
2023-10-23 22:41:35,789 epoch 3 - iter 890/894 - loss 0.08304645 - time (sec): 56.73 - samples/sec: 1518.37 - lr: 0.000023 - momentum: 0.000000
2023-10-23 22:41:36,035 ----------------------------------------------------------------------------------------------------
2023-10-23 22:41:36,035 EPOCH 3 done: loss 0.0830 - lr: 0.000023
2023-10-23 22:41:42,527 DEV : loss 0.1837383210659027 - f1-score (micro avg) 0.742
2023-10-23 22:41:42,548 saving best model
2023-10-23 22:41:43,143 ----------------------------------------------------------------------------------------------------
2023-10-23 22:41:48,721 epoch 4 - iter 89/894 - loss 0.04033953 - time (sec): 5.58 - samples/sec: 1519.97 - lr: 0.000023 - momentum: 0.000000
2023-10-23 22:41:54,565 epoch 4 - iter 178/894 - loss 0.04576950 - time (sec): 11.42 - samples/sec: 1554.52 - lr: 0.000023 - momentum: 0.000000
2023-10-23 22:42:00,391 epoch 4 - iter 267/894 - loss 0.04855522 - time (sec): 17.25 - samples/sec: 1548.91 - lr: 0.000022 - momentum: 0.000000
2023-10-23 22:42:05,867 epoch 4 - iter 356/894 - loss 0.04981395 - time (sec): 22.72 - samples/sec: 1511.43 - lr: 0.000022 - momentum: 0.000000
2023-10-23 22:42:11,932 epoch 4 - iter 445/894 - loss 0.05343608 - time (sec): 28.79 - samples/sec: 1527.14 - lr: 0.000022 - momentum: 0.000000
2023-10-23 22:42:17,488 epoch 4 - iter 534/894 - loss 0.05227533 - time (sec): 34.34 - samples/sec: 1512.25 - lr: 0.000021 - momentum: 0.000000
2023-10-23 22:42:23,203 epoch 4 - iter 623/894 - loss 0.05475064 - time (sec): 40.06 - samples/sec: 1530.47 - lr: 0.000021 - momentum: 0.000000
2023-10-23 22:42:28,803 epoch 4 - iter 712/894 - loss 0.05378347 - time (sec): 45.66 - samples/sec: 1526.79 - lr: 0.000021 - momentum: 0.000000
2023-10-23 22:42:34,361 epoch 4 - iter 801/894 - loss 0.05333509 - time (sec): 51.22 - samples/sec: 1525.65 - lr: 0.000020 - momentum: 0.000000
2023-10-23 22:42:39,890 epoch 4 - iter 890/894 - loss 0.05299308 - time (sec): 56.75 - samples/sec: 1517.37 - lr: 0.000020 - momentum: 0.000000
2023-10-23 22:42:40,145 ----------------------------------------------------------------------------------------------------
2023-10-23 22:42:40,145 EPOCH 4 done: loss 0.0530 - lr: 0.000020
2023-10-23 22:42:46,604 DEV : loss 0.21632781624794006 - f1-score (micro avg) 0.7595
2023-10-23 22:42:46,624 saving best model
2023-10-23 22:42:47,215 ----------------------------------------------------------------------------------------------------
2023-10-23 22:42:52,801 epoch 5 - iter 89/894 - loss 0.03559663 - time (sec): 5.59 - samples/sec: 1548.30 - lr: 0.000020 - momentum: 0.000000
2023-10-23 22:42:58,651 epoch 5 - iter 178/894 - loss 0.03236078 - time (sec): 11.43 - samples/sec: 1544.48 - lr: 0.000019 - momentum: 0.000000
2023-10-23 22:43:04,179 epoch 5 - iter 267/894 - loss 0.02994001 - time (sec): 16.96 - samples/sec: 1524.48 - lr: 0.000019 - momentum: 0.000000
2023-10-23 22:43:09,721 epoch 5 - iter 356/894 - loss 0.03122186 - time (sec): 22.51 - samples/sec: 1516.21 - lr: 0.000019 - momentum: 0.000000
2023-10-23 22:43:15,445 epoch 5 - iter 445/894 - loss 0.03227968 - time (sec): 28.23 - samples/sec: 1509.16 - lr: 0.000018 - momentum: 0.000000
2023-10-23 22:43:21,155 epoch 5 - iter 534/894 - loss 0.03088626 - time (sec): 33.94 - samples/sec: 1507.36 - lr: 0.000018 - momentum: 0.000000
2023-10-23 22:43:26,666 epoch 5 - iter 623/894 - loss 0.03520768 - time (sec): 39.45 - samples/sec: 1507.71 - lr: 0.000018 - momentum: 0.000000
2023-10-23 22:43:32,555 epoch 5 - iter 712/894 - loss 0.03651354 - time (sec): 45.34 - samples/sec: 1516.80 - lr: 0.000017 - momentum: 0.000000
2023-10-23 22:43:38,120 epoch 5 - iter 801/894 - loss 0.03535800 - time (sec): 50.90 - samples/sec: 1521.88 - lr: 0.000017 - momentum: 0.000000
2023-10-23 22:43:43,750 epoch 5 - iter 890/894 - loss 0.03467838 - time (sec): 56.53 - samples/sec: 1522.51 - lr: 0.000017 - momentum: 0.000000
2023-10-23 22:43:44,017 ----------------------------------------------------------------------------------------------------
2023-10-23 22:43:44,017 EPOCH 5 done: loss 0.0346 - lr: 0.000017
2023-10-23 22:43:50,506 DEV : loss 0.23189181089401245 - f1-score (micro avg) 0.7573
2023-10-23 22:43:50,527 ----------------------------------------------------------------------------------------------------
2023-10-23 22:43:56,264 epoch 6 - iter 89/894 - loss 0.02532279 - time (sec): 5.74 - samples/sec: 1466.77 - lr: 0.000016 - momentum: 0.000000
2023-10-23 22:44:02,035 epoch 6 - iter 178/894 - loss 0.02807963 - time (sec): 11.51 - samples/sec: 1485.35 - lr: 0.000016 - momentum: 0.000000
2023-10-23 22:44:08,080 epoch 6 - iter 267/894 - loss 0.02735131 - time (sec): 17.55 - samples/sec: 1533.60 - lr: 0.000016 - momentum: 0.000000
2023-10-23 22:44:13,605 epoch 6 - iter 356/894 - loss 0.02484311 - time (sec): 23.08 - samples/sec: 1526.45 - lr: 0.000015 - momentum: 0.000000
2023-10-23 22:44:19,179 epoch 6 - iter 445/894 - loss 0.02606946 - time (sec): 28.65 - samples/sec: 1520.66 - lr: 0.000015 - momentum: 0.000000
2023-10-23 22:44:24,833 epoch 6 - iter 534/894 - loss 0.02445808 - time (sec): 34.31 - samples/sec: 1517.91 - lr: 0.000015 - momentum: 0.000000
2023-10-23 22:44:30,281 epoch 6 - iter 623/894 - loss 0.02488802 - time (sec): 39.75 - samples/sec: 1507.43 - lr: 0.000014 - momentum: 0.000000
2023-10-23 22:44:35,737 epoch 6 - iter 712/894 - loss 0.02351730 - time (sec): 45.21 - samples/sec: 1503.02 - lr: 0.000014 - momentum: 0.000000
2023-10-23 22:44:41,459 epoch 6 - iter 801/894 - loss 0.02489985 - time (sec): 50.93 - samples/sec: 1508.61 - lr: 0.000014 - momentum: 0.000000
2023-10-23 22:44:47,149 epoch 6 - iter 890/894 - loss 0.02565738 - time (sec): 56.62 - samples/sec: 1522.00 - lr: 0.000013 - momentum: 0.000000
2023-10-23 22:44:47,395 ----------------------------------------------------------------------------------------------------
2023-10-23 22:44:47,395 EPOCH 6 done: loss 0.0256 - lr: 0.000013
2023-10-23 22:44:53,878 DEV : loss 0.22867470979690552 - f1-score (micro avg) 0.7792
2023-10-23 22:44:53,899 saving best model
2023-10-23 22:44:54,488 ----------------------------------------------------------------------------------------------------
2023-10-23 22:45:00,386 epoch 7 - iter 89/894 - loss 0.01077021 - time (sec): 5.90 - samples/sec: 1560.90 - lr: 0.000013 - momentum: 0.000000
2023-10-23 22:45:06,031 epoch 7 - iter 178/894 - loss 0.01430220 - time (sec): 11.54 - samples/sec: 1545.05 - lr: 0.000013 - momentum: 0.000000
2023-10-23 22:45:11,563 epoch 7 - iter 267/894 - loss 0.01639112 - time (sec): 17.07 - samples/sec: 1527.65 - lr: 0.000012 - momentum: 0.000000
2023-10-23 22:45:17,350 epoch 7 - iter 356/894 - loss 0.01528920 - time (sec): 22.86 - samples/sec: 1562.40 - lr: 0.000012 - momentum: 0.000000
2023-10-23 22:45:22,971 epoch 7 - iter 445/894 - loss 0.01559562 - time (sec): 28.48 - samples/sec: 1541.88 - lr: 0.000012 - momentum: 0.000000
2023-10-23 22:45:28,693 epoch 7 - iter 534/894 - loss 0.01482951 - time (sec): 34.20 - samples/sec: 1533.75 - lr: 0.000011 - momentum: 0.000000
2023-10-23 22:45:34,323 epoch 7 - iter 623/894 - loss 0.01485682 - time (sec): 39.83 - samples/sec: 1534.43 - lr: 0.000011 - momentum: 0.000000
2023-10-23 22:45:40,101 epoch 7 - iter 712/894 - loss 0.01527614 - time (sec): 45.61 - samples/sec: 1541.19 - lr: 0.000011 - momentum: 0.000000
2023-10-23 22:45:45,810 epoch 7 - iter 801/894 - loss 0.01551461 - time (sec): 51.32 - samples/sec: 1536.03 - lr: 0.000010 - momentum: 0.000000
2023-10-23 22:45:51,180 epoch 7 - iter 890/894 - loss 0.01536354 - time (sec): 56.69 - samples/sec: 1520.76 - lr: 0.000010 - momentum: 0.000000
2023-10-23 22:45:51,418 ----------------------------------------------------------------------------------------------------
2023-10-23 22:45:51,419 EPOCH 7 done: loss 0.0155 - lr: 0.000010
2023-10-23 22:45:57,916 DEV : loss 0.28836268186569214 - f1-score (micro avg) 0.7771
2023-10-23 22:45:57,937 ----------------------------------------------------------------------------------------------------
2023-10-23 22:46:03,516 epoch 8 - iter 89/894 - loss 0.01163123 - time (sec): 5.58 - samples/sec: 1536.51 - lr: 0.000010 - momentum: 0.000000
2023-10-23 22:46:09,438 epoch 8 - iter 178/894 - loss 0.00810689 - time (sec): 11.50 - samples/sec: 1548.67 - lr: 0.000009 - momentum: 0.000000
2023-10-23 22:46:14,981 epoch 8 - iter 267/894 - loss 0.00784824 - time (sec): 17.04 - samples/sec: 1538.69 - lr: 0.000009 - momentum: 0.000000
2023-10-23 22:46:20,608 epoch 8 - iter 356/894 - loss 0.01197846 - time (sec): 22.67 - samples/sec: 1508.98 - lr: 0.000009 - momentum: 0.000000
2023-10-23 22:46:26,221 epoch 8 - iter 445/894 - loss 0.01161327 - time (sec): 28.28 - samples/sec: 1506.75 - lr: 0.000008 - momentum: 0.000000
2023-10-23 22:46:31,703 epoch 8 - iter 534/894 - loss 0.01039741 - time (sec): 33.77 - samples/sec: 1501.93 - lr: 0.000008 - momentum: 0.000000
2023-10-23 22:46:37,545 epoch 8 - iter 623/894 - loss 0.01046625 - time (sec): 39.61 - samples/sec: 1515.15 - lr: 0.000008 - momentum: 0.000000
2023-10-23 22:46:43,449 epoch 8 - iter 712/894 - loss 0.01054270 - time (sec): 45.51 - samples/sec: 1523.09 - lr: 0.000007 - momentum: 0.000000
2023-10-23 22:46:49,045 epoch 8 - iter 801/894 - loss 0.01117368 - time (sec): 51.11 - samples/sec: 1527.30 - lr: 0.000007 - momentum: 0.000000
2023-10-23 22:46:54,587 epoch 8 - iter 890/894 - loss 0.01090378 - time (sec): 56.65 - samples/sec: 1522.06 - lr: 0.000007 - momentum: 0.000000
2023-10-23 22:46:54,833 ----------------------------------------------------------------------------------------------------
2023-10-23 22:46:54,834 EPOCH 8 done: loss 0.0112 - lr: 0.000007
2023-10-23 22:47:01,339 DEV : loss 0.2503233850002289 - f1-score (micro avg) 0.7723
2023-10-23 22:47:01,359 ----------------------------------------------------------------------------------------------------
2023-10-23 22:47:06,960 epoch 9 - iter 89/894 - loss 0.00515796 - time (sec): 5.60 - samples/sec: 1498.16 - lr: 0.000006 - momentum: 0.000000
2023-10-23 22:47:12,763 epoch 9 - iter 178/894 - loss 0.00448032 - time (sec): 11.40 - samples/sec: 1518.92 - lr: 0.000006 - momentum: 0.000000
2023-10-23 22:47:18,481 epoch 9 - iter 267/894 - loss 0.00703261 - time (sec): 17.12 - samples/sec: 1546.46 - lr: 0.000006 - momentum: 0.000000
2023-10-23 22:47:24,038 epoch 9 - iter 356/894 - loss 0.00706641 - time (sec): 22.68 - samples/sec: 1527.96 - lr: 0.000005 - momentum: 0.000000
2023-10-23 22:47:29,594 epoch 9 - iter 445/894 - loss 0.00762180 - time (sec): 28.23 - samples/sec: 1524.64 - lr: 0.000005 - momentum: 0.000000
2023-10-23 22:47:35,235 epoch 9 - iter 534/894 - loss 0.00723271 - time (sec): 33.88 - samples/sec: 1515.45 - lr: 0.000005 - momentum: 0.000000
2023-10-23 22:47:40,790 epoch 9 - iter 623/894 - loss 0.00692797 - time (sec): 39.43 - samples/sec: 1515.79 - lr: 0.000004 - momentum: 0.000000
2023-10-23 22:47:46,801 epoch 9 - iter 712/894 - loss 0.00622829 - time (sec): 45.44 - samples/sec: 1533.09 - lr: 0.000004 - momentum: 0.000000
2023-10-23 22:47:52,292 epoch 9 - iter 801/894 - loss 0.00704424 - time (sec): 50.93 - samples/sec: 1521.27 - lr: 0.000004 - momentum: 0.000000
2023-10-23 22:47:57,969 epoch 9 - iter 890/894 - loss 0.00672045 - time (sec): 56.61 - samples/sec: 1519.79 - lr: 0.000003 - momentum: 0.000000
2023-10-23 22:47:58,222 ----------------------------------------------------------------------------------------------------
2023-10-23 22:47:58,223 EPOCH 9 done: loss 0.0068 - lr: 0.000003
2023-10-23 22:48:04,439 DEV : loss 0.2725300192832947 - f1-score (micro avg) 0.7826
2023-10-23 22:48:04,460 saving best model
2023-10-23 22:48:05,049 ----------------------------------------------------------------------------------------------------
2023-10-23 22:48:10,568 epoch 10 - iter 89/894 - loss 0.00606623 - time (sec): 5.52 - samples/sec: 1456.07 - lr: 0.000003 - momentum: 0.000000
2023-10-23 22:48:16,338 epoch 10 - iter 178/894 - loss 0.00324980 - time (sec): 11.29 - samples/sec: 1424.87 - lr: 0.000003 - momentum: 0.000000
2023-10-23 22:48:22,097 epoch 10 - iter 267/894 - loss 0.00269239 - time (sec): 17.05 - samples/sec: 1471.54 - lr: 0.000002 - momentum: 0.000000
2023-10-23 22:48:27,990 epoch 10 - iter 356/894 - loss 0.00292304 - time (sec): 22.94 - samples/sec: 1490.43 - lr: 0.000002 - momentum: 0.000000
2023-10-23 22:48:33,804 epoch 10 - iter 445/894 - loss 0.00241672 - time (sec): 28.75 - samples/sec: 1512.69 - lr: 0.000002 - momentum: 0.000000
2023-10-23 22:48:39,393 epoch 10 - iter 534/894 - loss 0.00222088 - time (sec): 34.34 - samples/sec: 1505.01 - lr: 0.000001 - momentum: 0.000000
2023-10-23 22:48:44,925 epoch 10 - iter 623/894 - loss 0.00269017 - time (sec): 39.88 - samples/sec: 1496.46 - lr: 0.000001 - momentum: 0.000000
2023-10-23 22:48:50,533 epoch 10 - iter 712/894 - loss 0.00259172 - time (sec): 45.48 - samples/sec: 1504.68 - lr: 0.000001 - momentum: 0.000000
2023-10-23 22:48:56,193 epoch 10 - iter 801/894 - loss 0.00296873 - time (sec): 51.14 - samples/sec: 1508.81 - lr: 0.000000 - momentum: 0.000000
2023-10-23 22:49:02,060 epoch 10 - iter 890/894 - loss 0.00316216 - time (sec): 57.01 - samples/sec: 1510.20 - lr: 0.000000 - momentum: 0.000000
2023-10-23 22:49:02,310 ----------------------------------------------------------------------------------------------------
2023-10-23 22:49:02,310 EPOCH 10 done: loss 0.0031 - lr: 0.000000
2023-10-23 22:49:08,530 DEV : loss 0.2750208377838135 - f1-score (micro avg) 0.7822
2023-10-23 22:49:09,022 ----------------------------------------------------------------------------------------------------
2023-10-23 22:49:09,023 Loading model from best epoch ...
2023-10-23 22:49:10,707 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-prod, B-prod, E-prod, I-prod, S-time, B-time, E-time, I-time
2023-10-23 22:49:15,538
Results:
- F-score (micro) 0.7568
- F-score (macro) 0.6789
- Accuracy 0.6261
By class:
precision recall f1-score support
loc 0.8285 0.8591 0.8435 596
pers 0.6838 0.7598 0.7198 333
org 0.5702 0.5227 0.5455 132
prod 0.6531 0.4848 0.5565 66
time 0.7447 0.7143 0.7292 49
micro avg 0.7477 0.7662 0.7568 1176
macro avg 0.6961 0.6681 0.6789 1176
weighted avg 0.7452 0.7662 0.7541 1176
2023-10-23 22:49:15,538 ----------------------------------------------------------------------------------------------------