stefan-it's picture
Upload ./training.log with huggingface_hub
7869a00
2023-10-23 21:51:48,190 ----------------------------------------------------------------------------------------------------
2023-10-23 21:51:48,191 Model: "SequenceTagger(
(embeddings): TransformerWordEmbeddings(
(model): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(64001, 768)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(1): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(2): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(3): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(4): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(5): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(6): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(7): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(8): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(9): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(10): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(11): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=768, out_features=21, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-23 21:51:48,191 ----------------------------------------------------------------------------------------------------
2023-10-23 21:51:48,191 MultiCorpus: 3575 train + 1235 dev + 1266 test sentences
- NER_HIPE_2022 Corpus: 3575 train + 1235 dev + 1266 test sentences - /home/ubuntu/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/de/with_doc_seperator
2023-10-23 21:51:48,191 ----------------------------------------------------------------------------------------------------
2023-10-23 21:51:48,191 Train: 3575 sentences
2023-10-23 21:51:48,191 (train_with_dev=False, train_with_test=False)
2023-10-23 21:51:48,191 ----------------------------------------------------------------------------------------------------
2023-10-23 21:51:48,191 Training Params:
2023-10-23 21:51:48,191 - learning_rate: "5e-05"
2023-10-23 21:51:48,191 - mini_batch_size: "8"
2023-10-23 21:51:48,191 - max_epochs: "10"
2023-10-23 21:51:48,191 - shuffle: "True"
2023-10-23 21:51:48,191 ----------------------------------------------------------------------------------------------------
2023-10-23 21:51:48,191 Plugins:
2023-10-23 21:51:48,191 - TensorboardLogger
2023-10-23 21:51:48,191 - LinearScheduler | warmup_fraction: '0.1'
2023-10-23 21:51:48,191 ----------------------------------------------------------------------------------------------------
2023-10-23 21:51:48,191 Final evaluation on model from best epoch (best-model.pt)
2023-10-23 21:51:48,192 - metric: "('micro avg', 'f1-score')"
2023-10-23 21:51:48,192 ----------------------------------------------------------------------------------------------------
2023-10-23 21:51:48,192 Computation:
2023-10-23 21:51:48,192 - compute on device: cuda:0
2023-10-23 21:51:48,192 - embedding storage: none
2023-10-23 21:51:48,192 ----------------------------------------------------------------------------------------------------
2023-10-23 21:51:48,192 Model training base path: "hmbench-hipe2020/de-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-3"
2023-10-23 21:51:48,192 ----------------------------------------------------------------------------------------------------
2023-10-23 21:51:48,192 ----------------------------------------------------------------------------------------------------
2023-10-23 21:51:48,192 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-23 21:51:51,912 epoch 1 - iter 44/447 - loss 2.29313211 - time (sec): 3.72 - samples/sec: 2237.83 - lr: 0.000005 - momentum: 0.000000
2023-10-23 21:51:56,026 epoch 1 - iter 88/447 - loss 1.38865220 - time (sec): 7.83 - samples/sec: 2193.17 - lr: 0.000010 - momentum: 0.000000
2023-10-23 21:51:59,986 epoch 1 - iter 132/447 - loss 1.05659540 - time (sec): 11.79 - samples/sec: 2206.41 - lr: 0.000015 - momentum: 0.000000
2023-10-23 21:52:03,902 epoch 1 - iter 176/447 - loss 0.87090716 - time (sec): 15.71 - samples/sec: 2211.45 - lr: 0.000020 - momentum: 0.000000
2023-10-23 21:52:08,079 epoch 1 - iter 220/447 - loss 0.75514341 - time (sec): 19.89 - samples/sec: 2202.39 - lr: 0.000024 - momentum: 0.000000
2023-10-23 21:52:11,971 epoch 1 - iter 264/447 - loss 0.68444882 - time (sec): 23.78 - samples/sec: 2187.95 - lr: 0.000029 - momentum: 0.000000
2023-10-23 21:52:16,054 epoch 1 - iter 308/447 - loss 0.62305335 - time (sec): 27.86 - samples/sec: 2175.18 - lr: 0.000034 - momentum: 0.000000
2023-10-23 21:52:19,742 epoch 1 - iter 352/447 - loss 0.57516175 - time (sec): 31.55 - samples/sec: 2176.87 - lr: 0.000039 - momentum: 0.000000
2023-10-23 21:52:23,652 epoch 1 - iter 396/447 - loss 0.53631211 - time (sec): 35.46 - samples/sec: 2171.06 - lr: 0.000044 - momentum: 0.000000
2023-10-23 21:52:27,765 epoch 1 - iter 440/447 - loss 0.50311971 - time (sec): 39.57 - samples/sec: 2153.98 - lr: 0.000049 - momentum: 0.000000
2023-10-23 21:52:28,386 ----------------------------------------------------------------------------------------------------
2023-10-23 21:52:28,386 EPOCH 1 done: loss 0.4989 - lr: 0.000049
2023-10-23 21:52:33,199 DEV : loss 0.15912418067455292 - f1-score (micro avg) 0.6212
2023-10-23 21:52:33,219 saving best model
2023-10-23 21:52:33,690 ----------------------------------------------------------------------------------------------------
2023-10-23 21:52:37,849 epoch 2 - iter 44/447 - loss 0.16382777 - time (sec): 4.16 - samples/sec: 2262.16 - lr: 0.000049 - momentum: 0.000000
2023-10-23 21:52:41,636 epoch 2 - iter 88/447 - loss 0.16917138 - time (sec): 7.95 - samples/sec: 2191.29 - lr: 0.000049 - momentum: 0.000000
2023-10-23 21:52:45,825 epoch 2 - iter 132/447 - loss 0.15404062 - time (sec): 12.13 - samples/sec: 2184.55 - lr: 0.000048 - momentum: 0.000000
2023-10-23 21:52:49,765 epoch 2 - iter 176/447 - loss 0.14805448 - time (sec): 16.07 - samples/sec: 2172.63 - lr: 0.000048 - momentum: 0.000000
2023-10-23 21:52:53,644 epoch 2 - iter 220/447 - loss 0.15365492 - time (sec): 19.95 - samples/sec: 2183.61 - lr: 0.000047 - momentum: 0.000000
2023-10-23 21:52:57,585 epoch 2 - iter 264/447 - loss 0.15189904 - time (sec): 23.89 - samples/sec: 2159.40 - lr: 0.000047 - momentum: 0.000000
2023-10-23 21:53:01,367 epoch 2 - iter 308/447 - loss 0.14490678 - time (sec): 27.68 - samples/sec: 2169.16 - lr: 0.000046 - momentum: 0.000000
2023-10-23 21:53:05,090 epoch 2 - iter 352/447 - loss 0.14359644 - time (sec): 31.40 - samples/sec: 2164.50 - lr: 0.000046 - momentum: 0.000000
2023-10-23 21:53:09,403 epoch 2 - iter 396/447 - loss 0.14016823 - time (sec): 35.71 - samples/sec: 2169.84 - lr: 0.000045 - momentum: 0.000000
2023-10-23 21:53:13,246 epoch 2 - iter 440/447 - loss 0.13820665 - time (sec): 39.56 - samples/sec: 2156.66 - lr: 0.000045 - momentum: 0.000000
2023-10-23 21:53:13,842 ----------------------------------------------------------------------------------------------------
2023-10-23 21:53:13,842 EPOCH 2 done: loss 0.1373 - lr: 0.000045
2023-10-23 21:53:20,306 DEV : loss 0.14210455119609833 - f1-score (micro avg) 0.7045
2023-10-23 21:53:20,326 saving best model
2023-10-23 21:53:20,913 ----------------------------------------------------------------------------------------------------
2023-10-23 21:53:25,525 epoch 3 - iter 44/447 - loss 0.06227375 - time (sec): 4.61 - samples/sec: 2261.38 - lr: 0.000044 - momentum: 0.000000
2023-10-23 21:53:29,525 epoch 3 - iter 88/447 - loss 0.07165190 - time (sec): 8.61 - samples/sec: 2207.97 - lr: 0.000043 - momentum: 0.000000
2023-10-23 21:53:33,454 epoch 3 - iter 132/447 - loss 0.08160062 - time (sec): 12.54 - samples/sec: 2176.12 - lr: 0.000043 - momentum: 0.000000
2023-10-23 21:53:37,270 epoch 3 - iter 176/447 - loss 0.07902211 - time (sec): 16.36 - samples/sec: 2161.58 - lr: 0.000042 - momentum: 0.000000
2023-10-23 21:53:41,358 epoch 3 - iter 220/447 - loss 0.08062267 - time (sec): 20.44 - samples/sec: 2137.04 - lr: 0.000042 - momentum: 0.000000
2023-10-23 21:53:45,283 epoch 3 - iter 264/447 - loss 0.07920300 - time (sec): 24.37 - samples/sec: 2141.99 - lr: 0.000041 - momentum: 0.000000
2023-10-23 21:53:49,123 epoch 3 - iter 308/447 - loss 0.07732504 - time (sec): 28.21 - samples/sec: 2170.23 - lr: 0.000041 - momentum: 0.000000
2023-10-23 21:53:52,752 epoch 3 - iter 352/447 - loss 0.07708732 - time (sec): 31.84 - samples/sec: 2155.62 - lr: 0.000040 - momentum: 0.000000
2023-10-23 21:53:56,778 epoch 3 - iter 396/447 - loss 0.07923155 - time (sec): 35.86 - samples/sec: 2139.98 - lr: 0.000040 - momentum: 0.000000
2023-10-23 21:54:00,715 epoch 3 - iter 440/447 - loss 0.07827057 - time (sec): 39.80 - samples/sec: 2145.05 - lr: 0.000039 - momentum: 0.000000
2023-10-23 21:54:01,263 ----------------------------------------------------------------------------------------------------
2023-10-23 21:54:01,264 EPOCH 3 done: loss 0.0777 - lr: 0.000039
2023-10-23 21:54:07,723 DEV : loss 0.14090511202812195 - f1-score (micro avg) 0.734
2023-10-23 21:54:07,743 saving best model
2023-10-23 21:54:08,333 ----------------------------------------------------------------------------------------------------
2023-10-23 21:54:12,185 epoch 4 - iter 44/447 - loss 0.03326246 - time (sec): 3.85 - samples/sec: 2188.51 - lr: 0.000038 - momentum: 0.000000
2023-10-23 21:54:16,012 epoch 4 - iter 88/447 - loss 0.05068608 - time (sec): 7.68 - samples/sec: 2182.23 - lr: 0.000038 - momentum: 0.000000
2023-10-23 21:54:20,213 epoch 4 - iter 132/447 - loss 0.04556488 - time (sec): 11.88 - samples/sec: 2185.55 - lr: 0.000037 - momentum: 0.000000
2023-10-23 21:54:24,231 epoch 4 - iter 176/447 - loss 0.04645699 - time (sec): 15.90 - samples/sec: 2148.79 - lr: 0.000037 - momentum: 0.000000
2023-10-23 21:54:28,566 epoch 4 - iter 220/447 - loss 0.04553391 - time (sec): 20.23 - samples/sec: 2164.44 - lr: 0.000036 - momentum: 0.000000
2023-10-23 21:54:32,420 epoch 4 - iter 264/447 - loss 0.04715501 - time (sec): 24.09 - samples/sec: 2150.66 - lr: 0.000036 - momentum: 0.000000
2023-10-23 21:54:36,267 epoch 4 - iter 308/447 - loss 0.04693343 - time (sec): 27.93 - samples/sec: 2140.13 - lr: 0.000035 - momentum: 0.000000
2023-10-23 21:54:39,947 epoch 4 - iter 352/447 - loss 0.04637201 - time (sec): 31.61 - samples/sec: 2136.49 - lr: 0.000035 - momentum: 0.000000
2023-10-23 21:54:44,126 epoch 4 - iter 396/447 - loss 0.04595036 - time (sec): 35.79 - samples/sec: 2128.05 - lr: 0.000034 - momentum: 0.000000
2023-10-23 21:54:48,078 epoch 4 - iter 440/447 - loss 0.04489916 - time (sec): 39.74 - samples/sec: 2135.48 - lr: 0.000033 - momentum: 0.000000
2023-10-23 21:54:48,940 ----------------------------------------------------------------------------------------------------
2023-10-23 21:54:48,941 EPOCH 4 done: loss 0.0446 - lr: 0.000033
2023-10-23 21:54:55,383 DEV : loss 0.19732795655727386 - f1-score (micro avg) 0.7408
2023-10-23 21:54:55,404 saving best model
2023-10-23 21:54:55,992 ----------------------------------------------------------------------------------------------------
2023-10-23 21:54:59,872 epoch 5 - iter 44/447 - loss 0.03250988 - time (sec): 3.88 - samples/sec: 2220.54 - lr: 0.000033 - momentum: 0.000000
2023-10-23 21:55:04,308 epoch 5 - iter 88/447 - loss 0.03182471 - time (sec): 8.32 - samples/sec: 2242.31 - lr: 0.000032 - momentum: 0.000000
2023-10-23 21:55:08,154 epoch 5 - iter 132/447 - loss 0.02881742 - time (sec): 12.16 - samples/sec: 2208.42 - lr: 0.000032 - momentum: 0.000000
2023-10-23 21:55:12,227 epoch 5 - iter 176/447 - loss 0.02933684 - time (sec): 16.23 - samples/sec: 2194.04 - lr: 0.000031 - momentum: 0.000000
2023-10-23 21:55:16,129 epoch 5 - iter 220/447 - loss 0.02984464 - time (sec): 20.14 - samples/sec: 2188.03 - lr: 0.000031 - momentum: 0.000000
2023-10-23 21:55:20,190 epoch 5 - iter 264/447 - loss 0.03171971 - time (sec): 24.20 - samples/sec: 2168.03 - lr: 0.000030 - momentum: 0.000000
2023-10-23 21:55:24,347 epoch 5 - iter 308/447 - loss 0.03032864 - time (sec): 28.35 - samples/sec: 2155.46 - lr: 0.000030 - momentum: 0.000000
2023-10-23 21:55:28,216 epoch 5 - iter 352/447 - loss 0.03110701 - time (sec): 32.22 - samples/sec: 2139.87 - lr: 0.000029 - momentum: 0.000000
2023-10-23 21:55:32,265 epoch 5 - iter 396/447 - loss 0.03273498 - time (sec): 36.27 - samples/sec: 2135.56 - lr: 0.000028 - momentum: 0.000000
2023-10-23 21:55:35,982 epoch 5 - iter 440/447 - loss 0.03230073 - time (sec): 39.99 - samples/sec: 2134.87 - lr: 0.000028 - momentum: 0.000000
2023-10-23 21:55:36,528 ----------------------------------------------------------------------------------------------------
2023-10-23 21:55:36,529 EPOCH 5 done: loss 0.0320 - lr: 0.000028
2023-10-23 21:55:42,985 DEV : loss 0.20663930475711823 - f1-score (micro avg) 0.764
2023-10-23 21:55:43,005 saving best model
2023-10-23 21:55:43,595 ----------------------------------------------------------------------------------------------------
2023-10-23 21:55:48,060 epoch 6 - iter 44/447 - loss 0.01932974 - time (sec): 4.46 - samples/sec: 2089.55 - lr: 0.000027 - momentum: 0.000000
2023-10-23 21:55:51,578 epoch 6 - iter 88/447 - loss 0.02196792 - time (sec): 7.98 - samples/sec: 2098.85 - lr: 0.000027 - momentum: 0.000000
2023-10-23 21:55:55,550 epoch 6 - iter 132/447 - loss 0.02780813 - time (sec): 11.95 - samples/sec: 2120.65 - lr: 0.000026 - momentum: 0.000000
2023-10-23 21:56:00,231 epoch 6 - iter 176/447 - loss 0.02510854 - time (sec): 16.63 - samples/sec: 2083.70 - lr: 0.000026 - momentum: 0.000000
2023-10-23 21:56:04,343 epoch 6 - iter 220/447 - loss 0.02380277 - time (sec): 20.75 - samples/sec: 2081.98 - lr: 0.000025 - momentum: 0.000000
2023-10-23 21:56:08,170 epoch 6 - iter 264/447 - loss 0.02441174 - time (sec): 24.57 - samples/sec: 2087.86 - lr: 0.000025 - momentum: 0.000000
2023-10-23 21:56:11,950 epoch 6 - iter 308/447 - loss 0.02554501 - time (sec): 28.35 - samples/sec: 2088.99 - lr: 0.000024 - momentum: 0.000000
2023-10-23 21:56:15,673 epoch 6 - iter 352/447 - loss 0.02528824 - time (sec): 32.08 - samples/sec: 2110.35 - lr: 0.000023 - momentum: 0.000000
2023-10-23 21:56:19,481 epoch 6 - iter 396/447 - loss 0.02554394 - time (sec): 35.88 - samples/sec: 2125.00 - lr: 0.000023 - momentum: 0.000000
2023-10-23 21:56:23,611 epoch 6 - iter 440/447 - loss 0.02544473 - time (sec): 40.01 - samples/sec: 2128.70 - lr: 0.000022 - momentum: 0.000000
2023-10-23 21:56:24,236 ----------------------------------------------------------------------------------------------------
2023-10-23 21:56:24,236 EPOCH 6 done: loss 0.0255 - lr: 0.000022
2023-10-23 21:56:30,728 DEV : loss 0.22447437047958374 - f1-score (micro avg) 0.7667
2023-10-23 21:56:30,748 saving best model
2023-10-23 21:56:31,337 ----------------------------------------------------------------------------------------------------
2023-10-23 21:56:35,589 epoch 7 - iter 44/447 - loss 0.01544098 - time (sec): 4.25 - samples/sec: 2160.52 - lr: 0.000022 - momentum: 0.000000
2023-10-23 21:56:40,182 epoch 7 - iter 88/447 - loss 0.01431395 - time (sec): 8.84 - samples/sec: 2132.61 - lr: 0.000021 - momentum: 0.000000
2023-10-23 21:56:43,965 epoch 7 - iter 132/447 - loss 0.01454596 - time (sec): 12.63 - samples/sec: 2163.80 - lr: 0.000021 - momentum: 0.000000
2023-10-23 21:56:47,703 epoch 7 - iter 176/447 - loss 0.01350481 - time (sec): 16.37 - samples/sec: 2140.77 - lr: 0.000020 - momentum: 0.000000
2023-10-23 21:56:51,756 epoch 7 - iter 220/447 - loss 0.01442563 - time (sec): 20.42 - samples/sec: 2112.10 - lr: 0.000020 - momentum: 0.000000
2023-10-23 21:56:55,642 epoch 7 - iter 264/447 - loss 0.01532664 - time (sec): 24.30 - samples/sec: 2115.03 - lr: 0.000019 - momentum: 0.000000
2023-10-23 21:56:59,603 epoch 7 - iter 308/447 - loss 0.01488010 - time (sec): 28.27 - samples/sec: 2132.00 - lr: 0.000018 - momentum: 0.000000
2023-10-23 21:57:03,516 epoch 7 - iter 352/447 - loss 0.01462539 - time (sec): 32.18 - samples/sec: 2127.87 - lr: 0.000018 - momentum: 0.000000
2023-10-23 21:57:07,421 epoch 7 - iter 396/447 - loss 0.01629265 - time (sec): 36.08 - samples/sec: 2136.74 - lr: 0.000017 - momentum: 0.000000
2023-10-23 21:57:11,303 epoch 7 - iter 440/447 - loss 0.01580825 - time (sec): 39.97 - samples/sec: 2132.81 - lr: 0.000017 - momentum: 0.000000
2023-10-23 21:57:11,917 ----------------------------------------------------------------------------------------------------
2023-10-23 21:57:11,917 EPOCH 7 done: loss 0.0157 - lr: 0.000017
2023-10-23 21:57:18,395 DEV : loss 0.23538099229335785 - f1-score (micro avg) 0.7841
2023-10-23 21:57:18,415 saving best model
2023-10-23 21:57:19,004 ----------------------------------------------------------------------------------------------------
2023-10-23 21:57:23,234 epoch 8 - iter 44/447 - loss 0.00792987 - time (sec): 4.23 - samples/sec: 2029.91 - lr: 0.000016 - momentum: 0.000000
2023-10-23 21:57:27,029 epoch 8 - iter 88/447 - loss 0.01703551 - time (sec): 8.02 - samples/sec: 2082.77 - lr: 0.000016 - momentum: 0.000000
2023-10-23 21:57:30,991 epoch 8 - iter 132/447 - loss 0.01456325 - time (sec): 11.99 - samples/sec: 2081.23 - lr: 0.000015 - momentum: 0.000000
2023-10-23 21:57:34,736 epoch 8 - iter 176/447 - loss 0.01409087 - time (sec): 15.73 - samples/sec: 2096.44 - lr: 0.000015 - momentum: 0.000000
2023-10-23 21:57:38,810 epoch 8 - iter 220/447 - loss 0.01363118 - time (sec): 19.81 - samples/sec: 2090.90 - lr: 0.000014 - momentum: 0.000000
2023-10-23 21:57:42,589 epoch 8 - iter 264/447 - loss 0.01253893 - time (sec): 23.58 - samples/sec: 2107.17 - lr: 0.000013 - momentum: 0.000000
2023-10-23 21:57:47,006 epoch 8 - iter 308/447 - loss 0.01335359 - time (sec): 28.00 - samples/sec: 2116.04 - lr: 0.000013 - momentum: 0.000000
2023-10-23 21:57:50,843 epoch 8 - iter 352/447 - loss 0.01275915 - time (sec): 31.84 - samples/sec: 2113.19 - lr: 0.000012 - momentum: 0.000000
2023-10-23 21:57:54,897 epoch 8 - iter 396/447 - loss 0.01177829 - time (sec): 35.89 - samples/sec: 2129.32 - lr: 0.000012 - momentum: 0.000000
2023-10-23 21:57:58,945 epoch 8 - iter 440/447 - loss 0.01177727 - time (sec): 39.94 - samples/sec: 2135.01 - lr: 0.000011 - momentum: 0.000000
2023-10-23 21:57:59,590 ----------------------------------------------------------------------------------------------------
2023-10-23 21:57:59,590 EPOCH 8 done: loss 0.0116 - lr: 0.000011
2023-10-23 21:58:06,070 DEV : loss 0.26462581753730774 - f1-score (micro avg) 0.7742
2023-10-23 21:58:06,090 ----------------------------------------------------------------------------------------------------
2023-10-23 21:58:09,847 epoch 9 - iter 44/447 - loss 0.00576407 - time (sec): 3.76 - samples/sec: 2085.12 - lr: 0.000011 - momentum: 0.000000
2023-10-23 21:58:14,233 epoch 9 - iter 88/447 - loss 0.00601925 - time (sec): 8.14 - samples/sec: 2162.51 - lr: 0.000010 - momentum: 0.000000
2023-10-23 21:58:18,304 epoch 9 - iter 132/447 - loss 0.00746578 - time (sec): 12.21 - samples/sec: 2172.28 - lr: 0.000010 - momentum: 0.000000
2023-10-23 21:58:22,338 epoch 9 - iter 176/447 - loss 0.00658948 - time (sec): 16.25 - samples/sec: 2153.41 - lr: 0.000009 - momentum: 0.000000
2023-10-23 21:58:26,476 epoch 9 - iter 220/447 - loss 0.00735456 - time (sec): 20.39 - samples/sec: 2139.00 - lr: 0.000008 - momentum: 0.000000
2023-10-23 21:58:30,916 epoch 9 - iter 264/447 - loss 0.00697126 - time (sec): 24.83 - samples/sec: 2129.10 - lr: 0.000008 - momentum: 0.000000
2023-10-23 21:58:34,709 epoch 9 - iter 308/447 - loss 0.00649014 - time (sec): 28.62 - samples/sec: 2132.46 - lr: 0.000007 - momentum: 0.000000
2023-10-23 21:58:38,420 epoch 9 - iter 352/447 - loss 0.00714014 - time (sec): 32.33 - samples/sec: 2132.07 - lr: 0.000007 - momentum: 0.000000
2023-10-23 21:58:42,066 epoch 9 - iter 396/447 - loss 0.00643507 - time (sec): 35.98 - samples/sec: 2129.34 - lr: 0.000006 - momentum: 0.000000
2023-10-23 21:58:45,856 epoch 9 - iter 440/447 - loss 0.00633788 - time (sec): 39.76 - samples/sec: 2136.00 - lr: 0.000006 - momentum: 0.000000
2023-10-23 21:58:46,577 ----------------------------------------------------------------------------------------------------
2023-10-23 21:58:46,577 EPOCH 9 done: loss 0.0062 - lr: 0.000006
2023-10-23 21:58:53,066 DEV : loss 0.25273650884628296 - f1-score (micro avg) 0.7833
2023-10-23 21:58:53,086 ----------------------------------------------------------------------------------------------------
2023-10-23 21:58:57,441 epoch 10 - iter 44/447 - loss 0.00306477 - time (sec): 4.35 - samples/sec: 2095.30 - lr: 0.000005 - momentum: 0.000000
2023-10-23 21:59:01,370 epoch 10 - iter 88/447 - loss 0.00266064 - time (sec): 8.28 - samples/sec: 2113.69 - lr: 0.000005 - momentum: 0.000000
2023-10-23 21:59:05,788 epoch 10 - iter 132/447 - loss 0.00212850 - time (sec): 12.70 - samples/sec: 2136.19 - lr: 0.000004 - momentum: 0.000000
2023-10-23 21:59:09,588 epoch 10 - iter 176/447 - loss 0.00183704 - time (sec): 16.50 - samples/sec: 2145.41 - lr: 0.000003 - momentum: 0.000000
2023-10-23 21:59:13,508 epoch 10 - iter 220/447 - loss 0.00284510 - time (sec): 20.42 - samples/sec: 2131.51 - lr: 0.000003 - momentum: 0.000000
2023-10-23 21:59:17,442 epoch 10 - iter 264/447 - loss 0.00406030 - time (sec): 24.36 - samples/sec: 2146.58 - lr: 0.000002 - momentum: 0.000000
2023-10-23 21:59:21,210 epoch 10 - iter 308/447 - loss 0.00388394 - time (sec): 28.12 - samples/sec: 2146.26 - lr: 0.000002 - momentum: 0.000000
2023-10-23 21:59:25,355 epoch 10 - iter 352/447 - loss 0.00402897 - time (sec): 32.27 - samples/sec: 2151.17 - lr: 0.000001 - momentum: 0.000000
2023-10-23 21:59:29,040 epoch 10 - iter 396/447 - loss 0.00401045 - time (sec): 35.95 - samples/sec: 2144.19 - lr: 0.000001 - momentum: 0.000000
2023-10-23 21:59:32,940 epoch 10 - iter 440/447 - loss 0.00388586 - time (sec): 39.85 - samples/sec: 2137.91 - lr: 0.000000 - momentum: 0.000000
2023-10-23 21:59:33,556 ----------------------------------------------------------------------------------------------------
2023-10-23 21:59:33,557 EPOCH 10 done: loss 0.0038 - lr: 0.000000
2023-10-23 21:59:39,761 DEV : loss 0.261545330286026 - f1-score (micro avg) 0.791
2023-10-23 21:59:39,782 saving best model
2023-10-23 21:59:40,837 ----------------------------------------------------------------------------------------------------
2023-10-23 21:59:40,838 Loading model from best epoch ...
2023-10-23 21:59:42,794 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-prod, B-prod, E-prod, I-prod, S-time, B-time, E-time, I-time
2023-10-23 21:59:47,343
Results:
- F-score (micro) 0.7492
- F-score (macro) 0.6673
- Accuracy 0.6174
By class:
precision recall f1-score support
loc 0.8385 0.8624 0.8503 596
pers 0.6556 0.7718 0.7090 333
org 0.4885 0.4848 0.4867 132
prod 0.6346 0.5000 0.5593 66
time 0.7727 0.6939 0.7312 49
micro avg 0.7321 0.7670 0.7492 1176
macro avg 0.6780 0.6626 0.6673 1176
weighted avg 0.7332 0.7670 0.7482 1176
2023-10-23 21:59:47,343 ----------------------------------------------------------------------------------------------------