2023-10-15 11:23:10,757 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:23:10,758 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(32001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-15 11:23:10,758 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:23:10,758 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator 2023-10-15 11:23:10,758 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:23:10,758 Train: 20847 sentences 2023-10-15 11:23:10,758 (train_with_dev=False, train_with_test=False) 2023-10-15 11:23:10,758 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:23:10,758 Training Params: 2023-10-15 11:23:10,758 - learning_rate: "3e-05" 2023-10-15 11:23:10,758 - mini_batch_size: "8" 2023-10-15 11:23:10,758 - max_epochs: "10" 2023-10-15 11:23:10,758 - shuffle: "True" 2023-10-15 11:23:10,758 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:23:10,758 Plugins: 2023-10-15 11:23:10,758 - LinearScheduler | warmup_fraction: '0.1' 2023-10-15 11:23:10,758 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:23:10,758 Final evaluation on model from best epoch (best-model.pt) 2023-10-15 11:23:10,758 - metric: "('micro avg', 'f1-score')" 2023-10-15 11:23:10,758 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:23:10,758 Computation: 2023-10-15 11:23:10,758 - compute on device: cuda:0 2023-10-15 11:23:10,758 - embedding storage: none 2023-10-15 11:23:10,758 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:23:10,758 Model training base path: "hmbench-newseye/de-dbmdz/bert-base-historic-multilingual-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-1" 2023-10-15 11:23:10,758 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:23:10,758 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:23:30,871 epoch 1 - iter 260/2606 - loss 1.95608573 - time (sec): 20.11 - samples/sec: 1821.03 - lr: 0.000003 - momentum: 0.000000 2023-10-15 11:23:49,246 epoch 1 - iter 520/2606 - loss 1.20498957 - time (sec): 38.49 - samples/sec: 1902.63 - lr: 0.000006 - momentum: 0.000000 2023-10-15 11:24:08,718 epoch 1 - iter 780/2606 - loss 0.88774684 - time (sec): 57.96 - samples/sec: 1938.13 - lr: 0.000009 - momentum: 0.000000 2023-10-15 11:24:26,997 epoch 1 - iter 1040/2606 - loss 0.74089882 - time (sec): 76.24 - samples/sec: 1944.19 - lr: 0.000012 - momentum: 0.000000 2023-10-15 11:24:46,543 epoch 1 - iter 1300/2606 - loss 0.62868238 - time (sec): 95.78 - samples/sec: 1963.38 - lr: 0.000015 - momentum: 0.000000 2023-10-15 11:25:05,186 epoch 1 - iter 1560/2606 - loss 0.56241762 - time (sec): 114.43 - samples/sec: 1968.23 - lr: 0.000018 - momentum: 0.000000 2023-10-15 11:25:22,602 epoch 1 - iter 1820/2606 - loss 0.51681442 - time (sec): 131.84 - samples/sec: 1961.99 - lr: 0.000021 - momentum: 0.000000 2023-10-15 11:25:41,872 epoch 1 - iter 2080/2606 - loss 0.47721131 - time (sec): 151.11 - samples/sec: 1963.11 - lr: 0.000024 - momentum: 0.000000 2023-10-15 11:26:00,694 epoch 1 - iter 2340/2606 - loss 0.44717332 - time (sec): 169.93 - samples/sec: 1957.85 - lr: 0.000027 - momentum: 0.000000 2023-10-15 11:26:18,704 epoch 1 - iter 2600/2606 - loss 0.42510976 - time (sec): 187.94 - samples/sec: 1951.67 - lr: 0.000030 - momentum: 0.000000 2023-10-15 11:26:19,102 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:26:19,103 EPOCH 1 done: loss 0.4246 - lr: 0.000030 2023-10-15 11:26:24,777 DEV : loss 0.10493393242359161 - f1-score (micro avg) 0.2874 2023-10-15 11:26:24,804 saving best model 2023-10-15 11:26:25,279 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:26:44,170 epoch 2 - iter 260/2606 - loss 0.15135290 - time (sec): 18.89 - samples/sec: 2012.96 - lr: 0.000030 - momentum: 0.000000 2023-10-15 11:27:02,300 epoch 2 - iter 520/2606 - loss 0.15584007 - time (sec): 37.02 - samples/sec: 1928.81 - lr: 0.000029 - momentum: 0.000000 2023-10-15 11:27:21,347 epoch 2 - iter 780/2606 - loss 0.15466883 - time (sec): 56.07 - samples/sec: 1940.15 - lr: 0.000029 - momentum: 0.000000 2023-10-15 11:27:40,743 epoch 2 - iter 1040/2606 - loss 0.15722047 - time (sec): 75.46 - samples/sec: 1945.92 - lr: 0.000029 - momentum: 0.000000 2023-10-15 11:27:59,737 epoch 2 - iter 1300/2606 - loss 0.15371850 - time (sec): 94.46 - samples/sec: 1945.89 - lr: 0.000028 - momentum: 0.000000 2023-10-15 11:28:17,978 epoch 2 - iter 1560/2606 - loss 0.15077449 - time (sec): 112.70 - samples/sec: 1928.06 - lr: 0.000028 - momentum: 0.000000 2023-10-15 11:28:37,463 epoch 2 - iter 1820/2606 - loss 0.15121932 - time (sec): 132.18 - samples/sec: 1928.51 - lr: 0.000028 - momentum: 0.000000 2023-10-15 11:28:56,764 epoch 2 - iter 2080/2606 - loss 0.14626589 - time (sec): 151.48 - samples/sec: 1935.22 - lr: 0.000027 - momentum: 0.000000 2023-10-15 11:29:16,655 epoch 2 - iter 2340/2606 - loss 0.14497160 - time (sec): 171.37 - samples/sec: 1942.69 - lr: 0.000027 - momentum: 0.000000 2023-10-15 11:29:34,519 epoch 2 - iter 2600/2606 - loss 0.14582616 - time (sec): 189.24 - samples/sec: 1937.24 - lr: 0.000027 - momentum: 0.000000 2023-10-15 11:29:34,896 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:29:34,896 EPOCH 2 done: loss 0.1457 - lr: 0.000027 2023-10-15 11:29:43,811 DEV : loss 0.1157253161072731 - f1-score (micro avg) 0.3282 2023-10-15 11:29:43,842 saving best model 2023-10-15 11:29:44,383 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:30:04,129 epoch 3 - iter 260/2606 - loss 0.09672823 - time (sec): 19.74 - samples/sec: 1911.15 - lr: 0.000026 - momentum: 0.000000 2023-10-15 11:30:22,485 epoch 3 - iter 520/2606 - loss 0.09813069 - time (sec): 38.10 - samples/sec: 1920.33 - lr: 0.000026 - momentum: 0.000000 2023-10-15 11:30:41,096 epoch 3 - iter 780/2606 - loss 0.09789230 - time (sec): 56.71 - samples/sec: 1899.82 - lr: 0.000026 - momentum: 0.000000 2023-10-15 11:30:59,847 epoch 3 - iter 1040/2606 - loss 0.09970218 - time (sec): 75.46 - samples/sec: 1901.46 - lr: 0.000025 - momentum: 0.000000 2023-10-15 11:31:18,566 epoch 3 - iter 1300/2606 - loss 0.10029814 - time (sec): 94.18 - samples/sec: 1913.54 - lr: 0.000025 - momentum: 0.000000 2023-10-15 11:31:37,456 epoch 3 - iter 1560/2606 - loss 0.09911648 - time (sec): 113.07 - samples/sec: 1925.81 - lr: 0.000025 - momentum: 0.000000 2023-10-15 11:31:56,736 epoch 3 - iter 1820/2606 - loss 0.09770698 - time (sec): 132.35 - samples/sec: 1940.18 - lr: 0.000024 - momentum: 0.000000 2023-10-15 11:32:14,998 epoch 3 - iter 2080/2606 - loss 0.09548429 - time (sec): 150.61 - samples/sec: 1943.35 - lr: 0.000024 - momentum: 0.000000 2023-10-15 11:32:34,739 epoch 3 - iter 2340/2606 - loss 0.09541300 - time (sec): 170.35 - samples/sec: 1936.42 - lr: 0.000024 - momentum: 0.000000 2023-10-15 11:32:53,656 epoch 3 - iter 2600/2606 - loss 0.09437391 - time (sec): 189.27 - samples/sec: 1938.48 - lr: 0.000023 - momentum: 0.000000 2023-10-15 11:32:54,014 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:32:54,014 EPOCH 3 done: loss 0.0944 - lr: 0.000023 2023-10-15 11:33:02,873 DEV : loss 0.2598646879196167 - f1-score (micro avg) 0.3297 2023-10-15 11:33:02,898 saving best model 2023-10-15 11:33:03,466 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:33:21,480 epoch 4 - iter 260/2606 - loss 0.06904616 - time (sec): 18.01 - samples/sec: 1954.12 - lr: 0.000023 - momentum: 0.000000 2023-10-15 11:33:40,589 epoch 4 - iter 520/2606 - loss 0.06431747 - time (sec): 37.12 - samples/sec: 1901.00 - lr: 0.000023 - momentum: 0.000000 2023-10-15 11:34:00,177 epoch 4 - iter 780/2606 - loss 0.06516267 - time (sec): 56.71 - samples/sec: 1918.47 - lr: 0.000022 - momentum: 0.000000 2023-10-15 11:34:18,744 epoch 4 - iter 1040/2606 - loss 0.06696138 - time (sec): 75.28 - samples/sec: 1935.08 - lr: 0.000022 - momentum: 0.000000 2023-10-15 11:34:36,926 epoch 4 - iter 1300/2606 - loss 0.06644839 - time (sec): 93.46 - samples/sec: 1934.94 - lr: 0.000022 - momentum: 0.000000 2023-10-15 11:34:55,744 epoch 4 - iter 1560/2606 - loss 0.06837482 - time (sec): 112.28 - samples/sec: 1945.03 - lr: 0.000021 - momentum: 0.000000 2023-10-15 11:35:15,091 epoch 4 - iter 1820/2606 - loss 0.06759837 - time (sec): 131.62 - samples/sec: 1938.44 - lr: 0.000021 - momentum: 0.000000 2023-10-15 11:35:33,800 epoch 4 - iter 2080/2606 - loss 0.06569611 - time (sec): 150.33 - samples/sec: 1949.43 - lr: 0.000021 - momentum: 0.000000 2023-10-15 11:35:52,303 epoch 4 - iter 2340/2606 - loss 0.06595273 - time (sec): 168.84 - samples/sec: 1941.72 - lr: 0.000020 - momentum: 0.000000 2023-10-15 11:36:12,192 epoch 4 - iter 2600/2606 - loss 0.06525384 - time (sec): 188.72 - samples/sec: 1941.11 - lr: 0.000020 - momentum: 0.000000 2023-10-15 11:36:12,591 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:36:12,591 EPOCH 4 done: loss 0.0655 - lr: 0.000020 2023-10-15 11:36:21,481 DEV : loss 0.3398613929748535 - f1-score (micro avg) 0.3508 2023-10-15 11:36:21,508 saving best model 2023-10-15 11:36:22,101 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:36:40,837 epoch 5 - iter 260/2606 - loss 0.04681142 - time (sec): 18.73 - samples/sec: 1912.70 - lr: 0.000020 - momentum: 0.000000 2023-10-15 11:36:59,251 epoch 5 - iter 520/2606 - loss 0.04619425 - time (sec): 37.15 - samples/sec: 1898.26 - lr: 0.000019 - momentum: 0.000000 2023-10-15 11:37:18,191 epoch 5 - iter 780/2606 - loss 0.04916841 - time (sec): 56.09 - samples/sec: 1906.49 - lr: 0.000019 - momentum: 0.000000 2023-10-15 11:37:36,945 epoch 5 - iter 1040/2606 - loss 0.04835895 - time (sec): 74.84 - samples/sec: 1920.06 - lr: 0.000019 - momentum: 0.000000 2023-10-15 11:37:56,295 epoch 5 - iter 1300/2606 - loss 0.04935859 - time (sec): 94.19 - samples/sec: 1907.75 - lr: 0.000018 - momentum: 0.000000 2023-10-15 11:38:15,599 epoch 5 - iter 1560/2606 - loss 0.05069348 - time (sec): 113.50 - samples/sec: 1916.18 - lr: 0.000018 - momentum: 0.000000 2023-10-15 11:38:35,508 epoch 5 - iter 1820/2606 - loss 0.04985800 - time (sec): 133.41 - samples/sec: 1908.04 - lr: 0.000018 - momentum: 0.000000 2023-10-15 11:38:55,541 epoch 5 - iter 2080/2606 - loss 0.04931997 - time (sec): 153.44 - samples/sec: 1916.40 - lr: 0.000017 - momentum: 0.000000 2023-10-15 11:39:15,029 epoch 5 - iter 2340/2606 - loss 0.04925847 - time (sec): 172.93 - samples/sec: 1923.96 - lr: 0.000017 - momentum: 0.000000 2023-10-15 11:39:32,811 epoch 5 - iter 2600/2606 - loss 0.04882081 - time (sec): 190.71 - samples/sec: 1922.34 - lr: 0.000017 - momentum: 0.000000 2023-10-15 11:39:33,271 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:39:33,271 EPOCH 5 done: loss 0.0488 - lr: 0.000017 2023-10-15 11:39:42,111 DEV : loss 0.33827313780784607 - f1-score (micro avg) 0.3759 2023-10-15 11:39:42,135 saving best model 2023-10-15 11:39:42,636 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:40:02,083 epoch 6 - iter 260/2606 - loss 0.03216757 - time (sec): 19.44 - samples/sec: 2018.67 - lr: 0.000016 - momentum: 0.000000 2023-10-15 11:40:20,796 epoch 6 - iter 520/2606 - loss 0.03377516 - time (sec): 38.16 - samples/sec: 2000.21 - lr: 0.000016 - momentum: 0.000000 2023-10-15 11:40:40,451 epoch 6 - iter 780/2606 - loss 0.03450669 - time (sec): 57.81 - samples/sec: 2000.27 - lr: 0.000016 - momentum: 0.000000 2023-10-15 11:40:58,899 epoch 6 - iter 1040/2606 - loss 0.03447335 - time (sec): 76.26 - samples/sec: 1969.32 - lr: 0.000015 - momentum: 0.000000 2023-10-15 11:41:16,192 epoch 6 - iter 1300/2606 - loss 0.03445124 - time (sec): 93.55 - samples/sec: 1955.94 - lr: 0.000015 - momentum: 0.000000 2023-10-15 11:41:35,241 epoch 6 - iter 1560/2606 - loss 0.03426452 - time (sec): 112.60 - samples/sec: 1961.66 - lr: 0.000015 - momentum: 0.000000 2023-10-15 11:41:53,190 epoch 6 - iter 1820/2606 - loss 0.03449286 - time (sec): 130.55 - samples/sec: 1948.04 - lr: 0.000014 - momentum: 0.000000 2023-10-15 11:42:12,877 epoch 6 - iter 2080/2606 - loss 0.03459348 - time (sec): 150.24 - samples/sec: 1954.68 - lr: 0.000014 - momentum: 0.000000 2023-10-15 11:42:32,418 epoch 6 - iter 2340/2606 - loss 0.03491351 - time (sec): 169.78 - samples/sec: 1952.20 - lr: 0.000014 - momentum: 0.000000 2023-10-15 11:42:50,282 epoch 6 - iter 2600/2606 - loss 0.03468906 - time (sec): 187.64 - samples/sec: 1953.90 - lr: 0.000013 - momentum: 0.000000 2023-10-15 11:42:50,712 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:42:50,712 EPOCH 6 done: loss 0.0347 - lr: 0.000013 2023-10-15 11:42:59,578 DEV : loss 0.37671998143196106 - f1-score (micro avg) 0.3759 2023-10-15 11:42:59,603 saving best model 2023-10-15 11:43:00,168 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:43:18,395 epoch 7 - iter 260/2606 - loss 0.02095137 - time (sec): 18.22 - samples/sec: 1932.76 - lr: 0.000013 - momentum: 0.000000 2023-10-15 11:43:37,265 epoch 7 - iter 520/2606 - loss 0.02458808 - time (sec): 37.09 - samples/sec: 1940.53 - lr: 0.000013 - momentum: 0.000000 2023-10-15 11:43:56,200 epoch 7 - iter 780/2606 - loss 0.02842129 - time (sec): 56.03 - samples/sec: 1954.51 - lr: 0.000012 - momentum: 0.000000 2023-10-15 11:44:14,945 epoch 7 - iter 1040/2606 - loss 0.02983051 - time (sec): 74.78 - samples/sec: 1954.56 - lr: 0.000012 - momentum: 0.000000 2023-10-15 11:44:33,678 epoch 7 - iter 1300/2606 - loss 0.02902919 - time (sec): 93.51 - samples/sec: 1952.03 - lr: 0.000012 - momentum: 0.000000 2023-10-15 11:44:53,466 epoch 7 - iter 1560/2606 - loss 0.02891612 - time (sec): 113.30 - samples/sec: 1937.76 - lr: 0.000011 - momentum: 0.000000 2023-10-15 11:45:14,253 epoch 7 - iter 1820/2606 - loss 0.02831922 - time (sec): 134.08 - samples/sec: 1924.72 - lr: 0.000011 - momentum: 0.000000 2023-10-15 11:45:32,859 epoch 7 - iter 2080/2606 - loss 0.02779547 - time (sec): 152.69 - samples/sec: 1925.35 - lr: 0.000011 - momentum: 0.000000 2023-10-15 11:45:51,573 epoch 7 - iter 2340/2606 - loss 0.02854297 - time (sec): 171.40 - samples/sec: 1925.01 - lr: 0.000010 - momentum: 0.000000 2023-10-15 11:46:10,274 epoch 7 - iter 2600/2606 - loss 0.02881930 - time (sec): 190.10 - samples/sec: 1926.52 - lr: 0.000010 - momentum: 0.000000 2023-10-15 11:46:10,807 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:46:10,807 EPOCH 7 done: loss 0.0288 - lr: 0.000010 2023-10-15 11:46:19,724 DEV : loss 0.4418008029460907 - f1-score (micro avg) 0.3586 2023-10-15 11:46:19,749 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:46:38,755 epoch 8 - iter 260/2606 - loss 0.01765218 - time (sec): 19.01 - samples/sec: 1966.68 - lr: 0.000010 - momentum: 0.000000 2023-10-15 11:46:56,956 epoch 8 - iter 520/2606 - loss 0.02088866 - time (sec): 37.21 - samples/sec: 1945.95 - lr: 0.000009 - momentum: 0.000000 2023-10-15 11:47:16,035 epoch 8 - iter 780/2606 - loss 0.02077464 - time (sec): 56.28 - samples/sec: 1935.26 - lr: 0.000009 - momentum: 0.000000 2023-10-15 11:47:34,625 epoch 8 - iter 1040/2606 - loss 0.01935007 - time (sec): 74.87 - samples/sec: 1931.34 - lr: 0.000009 - momentum: 0.000000 2023-10-15 11:47:53,113 epoch 8 - iter 1300/2606 - loss 0.01961430 - time (sec): 93.36 - samples/sec: 1933.92 - lr: 0.000008 - momentum: 0.000000 2023-10-15 11:48:12,999 epoch 8 - iter 1560/2606 - loss 0.02100904 - time (sec): 113.25 - samples/sec: 1946.62 - lr: 0.000008 - momentum: 0.000000 2023-10-15 11:48:31,646 epoch 8 - iter 1820/2606 - loss 0.02141028 - time (sec): 131.90 - samples/sec: 1953.85 - lr: 0.000008 - momentum: 0.000000 2023-10-15 11:48:50,341 epoch 8 - iter 2080/2606 - loss 0.02087340 - time (sec): 150.59 - samples/sec: 1939.85 - lr: 0.000007 - momentum: 0.000000 2023-10-15 11:49:09,307 epoch 8 - iter 2340/2606 - loss 0.02019028 - time (sec): 169.56 - samples/sec: 1943.76 - lr: 0.000007 - momentum: 0.000000 2023-10-15 11:49:28,050 epoch 8 - iter 2600/2606 - loss 0.02013491 - time (sec): 188.30 - samples/sec: 1945.41 - lr: 0.000007 - momentum: 0.000000 2023-10-15 11:49:28,555 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:49:28,555 EPOCH 8 done: loss 0.0201 - lr: 0.000007 2023-10-15 11:49:36,740 DEV : loss 0.40716004371643066 - f1-score (micro avg) 0.3727 2023-10-15 11:49:36,765 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:49:57,592 epoch 9 - iter 260/2606 - loss 0.01784026 - time (sec): 20.83 - samples/sec: 1894.37 - lr: 0.000006 - momentum: 0.000000 2023-10-15 11:50:16,456 epoch 9 - iter 520/2606 - loss 0.01374666 - time (sec): 39.69 - samples/sec: 1939.75 - lr: 0.000006 - momentum: 0.000000 2023-10-15 11:50:34,107 epoch 9 - iter 780/2606 - loss 0.01346939 - time (sec): 57.34 - samples/sec: 1927.06 - lr: 0.000006 - momentum: 0.000000 2023-10-15 11:50:53,332 epoch 9 - iter 1040/2606 - loss 0.01388641 - time (sec): 76.57 - samples/sec: 1934.95 - lr: 0.000005 - momentum: 0.000000 2023-10-15 11:51:12,267 epoch 9 - iter 1300/2606 - loss 0.01330099 - time (sec): 95.50 - samples/sec: 1932.69 - lr: 0.000005 - momentum: 0.000000 2023-10-15 11:51:30,969 epoch 9 - iter 1560/2606 - loss 0.01318018 - time (sec): 114.20 - samples/sec: 1927.10 - lr: 0.000005 - momentum: 0.000000 2023-10-15 11:51:49,982 epoch 9 - iter 1820/2606 - loss 0.01362242 - time (sec): 133.22 - samples/sec: 1919.43 - lr: 0.000004 - momentum: 0.000000 2023-10-15 11:52:09,059 epoch 9 - iter 2080/2606 - loss 0.01379042 - time (sec): 152.29 - samples/sec: 1920.32 - lr: 0.000004 - momentum: 0.000000 2023-10-15 11:52:27,174 epoch 9 - iter 2340/2606 - loss 0.01323694 - time (sec): 170.41 - samples/sec: 1916.43 - lr: 0.000004 - momentum: 0.000000 2023-10-15 11:52:47,281 epoch 9 - iter 2600/2606 - loss 0.01336791 - time (sec): 190.51 - samples/sec: 1921.44 - lr: 0.000003 - momentum: 0.000000 2023-10-15 11:52:47,906 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:52:47,906 EPOCH 9 done: loss 0.0133 - lr: 0.000003 2023-10-15 11:52:56,091 DEV : loss 0.4836600720882416 - f1-score (micro avg) 0.3695 2023-10-15 11:52:56,116 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:53:15,511 epoch 10 - iter 260/2606 - loss 0.00761937 - time (sec): 19.39 - samples/sec: 1973.81 - lr: 0.000003 - momentum: 0.000000 2023-10-15 11:53:34,937 epoch 10 - iter 520/2606 - loss 0.01059122 - time (sec): 38.82 - samples/sec: 1950.47 - lr: 0.000003 - momentum: 0.000000 2023-10-15 11:53:54,907 epoch 10 - iter 780/2606 - loss 0.00959745 - time (sec): 58.79 - samples/sec: 1956.66 - lr: 0.000002 - momentum: 0.000000 2023-10-15 11:54:14,832 epoch 10 - iter 1040/2606 - loss 0.00974413 - time (sec): 78.72 - samples/sec: 1933.32 - lr: 0.000002 - momentum: 0.000000 2023-10-15 11:54:33,515 epoch 10 - iter 1300/2606 - loss 0.00955880 - time (sec): 97.40 - samples/sec: 1932.05 - lr: 0.000002 - momentum: 0.000000 2023-10-15 11:54:51,472 epoch 10 - iter 1560/2606 - loss 0.00930499 - time (sec): 115.36 - samples/sec: 1919.99 - lr: 0.000001 - momentum: 0.000000 2023-10-15 11:55:09,895 epoch 10 - iter 1820/2606 - loss 0.00896874 - time (sec): 133.78 - samples/sec: 1923.78 - lr: 0.000001 - momentum: 0.000000 2023-10-15 11:55:28,885 epoch 10 - iter 2080/2606 - loss 0.00899681 - time (sec): 152.77 - samples/sec: 1925.01 - lr: 0.000001 - momentum: 0.000000 2023-10-15 11:55:48,156 epoch 10 - iter 2340/2606 - loss 0.00934111 - time (sec): 172.04 - samples/sec: 1926.74 - lr: 0.000000 - momentum: 0.000000 2023-10-15 11:56:06,065 epoch 10 - iter 2600/2606 - loss 0.00967478 - time (sec): 189.95 - samples/sec: 1929.90 - lr: 0.000000 - momentum: 0.000000 2023-10-15 11:56:06,427 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:56:06,427 EPOCH 10 done: loss 0.0097 - lr: 0.000000 2023-10-15 11:56:14,658 DEV : loss 0.5250458717346191 - f1-score (micro avg) 0.3623 2023-10-15 11:56:15,086 ---------------------------------------------------------------------------------------------------- 2023-10-15 11:56:15,087 Loading model from best epoch ... 2023-10-15 11:56:16,909 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-15 11:56:31,670 Results: - F-score (micro) 0.4262 - F-score (macro) 0.2901 - Accuracy 0.2744 By class: precision recall f1-score support LOC 0.4727 0.5066 0.4891 1214 PER 0.3735 0.4369 0.4027 808 ORG 0.2800 0.2578 0.2684 353 HumanProd 0.0000 0.0000 0.0000 15 micro avg 0.4106 0.4431 0.4262 2390 macro avg 0.2816 0.3003 0.2901 2390 weighted avg 0.4078 0.4431 0.4242 2390 2023-10-15 11:56:31,671 ----------------------------------------------------------------------------------------------------