2023-10-15 16:38:09,257 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:38:09,258 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(32001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-15 16:38:09,258 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:38:09,258 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator 2023-10-15 16:38:09,258 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:38:09,258 Train: 20847 sentences 2023-10-15 16:38:09,258 (train_with_dev=False, train_with_test=False) 2023-10-15 16:38:09,258 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:38:09,258 Training Params: 2023-10-15 16:38:09,258 - learning_rate: "3e-05" 2023-10-15 16:38:09,258 - mini_batch_size: "8" 2023-10-15 16:38:09,258 - max_epochs: "10" 2023-10-15 16:38:09,258 - shuffle: "True" 2023-10-15 16:38:09,258 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:38:09,258 Plugins: 2023-10-15 16:38:09,258 - LinearScheduler | warmup_fraction: '0.1' 2023-10-15 16:38:09,258 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:38:09,258 Final evaluation on model from best epoch (best-model.pt) 2023-10-15 16:38:09,258 - metric: "('micro avg', 'f1-score')" 2023-10-15 16:38:09,258 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:38:09,258 Computation: 2023-10-15 16:38:09,258 - compute on device: cuda:0 2023-10-15 16:38:09,258 - embedding storage: none 2023-10-15 16:38:09,258 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:38:09,258 Model training base path: "hmbench-newseye/de-dbmdz/bert-base-historic-multilingual-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-3" 2023-10-15 16:38:09,259 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:38:09,259 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:38:28,045 epoch 1 - iter 260/2606 - loss 1.78304728 - time (sec): 18.79 - samples/sec: 1856.12 - lr: 0.000003 - momentum: 0.000000 2023-10-15 16:38:46,594 epoch 1 - iter 520/2606 - loss 1.15216820 - time (sec): 37.33 - samples/sec: 1898.75 - lr: 0.000006 - momentum: 0.000000 2023-10-15 16:39:05,821 epoch 1 - iter 780/2606 - loss 0.86568725 - time (sec): 56.56 - samples/sec: 1910.94 - lr: 0.000009 - momentum: 0.000000 2023-10-15 16:39:25,235 epoch 1 - iter 1040/2606 - loss 0.70886674 - time (sec): 75.98 - samples/sec: 1914.21 - lr: 0.000012 - momentum: 0.000000 2023-10-15 16:39:44,071 epoch 1 - iter 1300/2606 - loss 0.61350901 - time (sec): 94.81 - samples/sec: 1924.89 - lr: 0.000015 - momentum: 0.000000 2023-10-15 16:40:03,717 epoch 1 - iter 1560/2606 - loss 0.54047028 - time (sec): 114.46 - samples/sec: 1940.17 - lr: 0.000018 - momentum: 0.000000 2023-10-15 16:40:22,096 epoch 1 - iter 1820/2606 - loss 0.49291843 - time (sec): 132.84 - samples/sec: 1946.58 - lr: 0.000021 - momentum: 0.000000 2023-10-15 16:40:40,387 epoch 1 - iter 2080/2606 - loss 0.45739180 - time (sec): 151.13 - samples/sec: 1951.82 - lr: 0.000024 - momentum: 0.000000 2023-10-15 16:40:59,388 epoch 1 - iter 2340/2606 - loss 0.43100594 - time (sec): 170.13 - samples/sec: 1932.42 - lr: 0.000027 - momentum: 0.000000 2023-10-15 16:41:18,986 epoch 1 - iter 2600/2606 - loss 0.40515264 - time (sec): 189.73 - samples/sec: 1933.21 - lr: 0.000030 - momentum: 0.000000 2023-10-15 16:41:19,367 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:41:19,367 EPOCH 1 done: loss 0.4047 - lr: 0.000030 2023-10-15 16:41:25,180 DEV : loss 0.1570487767457962 - f1-score (micro avg) 0.2895 2023-10-15 16:41:25,209 saving best model 2023-10-15 16:41:25,591 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:41:43,895 epoch 2 - iter 260/2606 - loss 0.20098129 - time (sec): 18.30 - samples/sec: 1820.25 - lr: 0.000030 - momentum: 0.000000 2023-10-15 16:42:02,865 epoch 2 - iter 520/2606 - loss 0.18249185 - time (sec): 37.27 - samples/sec: 1887.05 - lr: 0.000029 - momentum: 0.000000 2023-10-15 16:42:21,212 epoch 2 - iter 780/2606 - loss 0.17192597 - time (sec): 55.62 - samples/sec: 1902.07 - lr: 0.000029 - momentum: 0.000000 2023-10-15 16:42:40,345 epoch 2 - iter 1040/2606 - loss 0.16132827 - time (sec): 74.75 - samples/sec: 1912.44 - lr: 0.000029 - momentum: 0.000000 2023-10-15 16:42:58,644 epoch 2 - iter 1300/2606 - loss 0.16050052 - time (sec): 93.05 - samples/sec: 1920.11 - lr: 0.000028 - momentum: 0.000000 2023-10-15 16:43:18,524 epoch 2 - iter 1560/2606 - loss 0.15792990 - time (sec): 112.93 - samples/sec: 1931.30 - lr: 0.000028 - momentum: 0.000000 2023-10-15 16:43:37,445 epoch 2 - iter 1820/2606 - loss 0.15662757 - time (sec): 131.85 - samples/sec: 1927.40 - lr: 0.000028 - momentum: 0.000000 2023-10-15 16:43:56,056 epoch 2 - iter 2080/2606 - loss 0.15414219 - time (sec): 150.46 - samples/sec: 1930.09 - lr: 0.000027 - momentum: 0.000000 2023-10-15 16:44:15,590 epoch 2 - iter 2340/2606 - loss 0.15078285 - time (sec): 170.00 - samples/sec: 1928.67 - lr: 0.000027 - momentum: 0.000000 2023-10-15 16:44:35,237 epoch 2 - iter 2600/2606 - loss 0.14899030 - time (sec): 189.64 - samples/sec: 1930.76 - lr: 0.000027 - momentum: 0.000000 2023-10-15 16:44:35,739 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:44:35,739 EPOCH 2 done: loss 0.1487 - lr: 0.000027 2023-10-15 16:44:44,918 DEV : loss 0.14093472063541412 - f1-score (micro avg) 0.3357 2023-10-15 16:44:44,945 saving best model 2023-10-15 16:44:45,501 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:45:03,303 epoch 3 - iter 260/2606 - loss 0.10491536 - time (sec): 17.80 - samples/sec: 1883.64 - lr: 0.000026 - momentum: 0.000000 2023-10-15 16:45:21,796 epoch 3 - iter 520/2606 - loss 0.10122591 - time (sec): 36.29 - samples/sec: 1890.96 - lr: 0.000026 - momentum: 0.000000 2023-10-15 16:45:42,689 epoch 3 - iter 780/2606 - loss 0.09961218 - time (sec): 57.18 - samples/sec: 1923.87 - lr: 0.000026 - momentum: 0.000000 2023-10-15 16:46:02,554 epoch 3 - iter 1040/2606 - loss 0.09635866 - time (sec): 77.05 - samples/sec: 1914.08 - lr: 0.000025 - momentum: 0.000000 2023-10-15 16:46:21,515 epoch 3 - iter 1300/2606 - loss 0.09838515 - time (sec): 96.01 - samples/sec: 1921.98 - lr: 0.000025 - momentum: 0.000000 2023-10-15 16:46:39,827 epoch 3 - iter 1560/2606 - loss 0.09935068 - time (sec): 114.32 - samples/sec: 1931.86 - lr: 0.000025 - momentum: 0.000000 2023-10-15 16:46:58,868 epoch 3 - iter 1820/2606 - loss 0.09839679 - time (sec): 133.36 - samples/sec: 1931.00 - lr: 0.000024 - momentum: 0.000000 2023-10-15 16:47:17,465 epoch 3 - iter 2080/2606 - loss 0.09880177 - time (sec): 151.96 - samples/sec: 1925.39 - lr: 0.000024 - momentum: 0.000000 2023-10-15 16:47:35,970 epoch 3 - iter 2340/2606 - loss 0.09778731 - time (sec): 170.46 - samples/sec: 1926.54 - lr: 0.000024 - momentum: 0.000000 2023-10-15 16:47:55,086 epoch 3 - iter 2600/2606 - loss 0.09692475 - time (sec): 189.58 - samples/sec: 1934.04 - lr: 0.000023 - momentum: 0.000000 2023-10-15 16:47:55,463 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:47:55,463 EPOCH 3 done: loss 0.0970 - lr: 0.000023 2023-10-15 16:48:03,673 DEV : loss 0.20114129781723022 - f1-score (micro avg) 0.3372 2023-10-15 16:48:03,699 saving best model 2023-10-15 16:48:04,299 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:48:22,997 epoch 4 - iter 260/2606 - loss 0.05906672 - time (sec): 18.70 - samples/sec: 1981.27 - lr: 0.000023 - momentum: 0.000000 2023-10-15 16:48:42,832 epoch 4 - iter 520/2606 - loss 0.06444519 - time (sec): 38.53 - samples/sec: 1949.40 - lr: 0.000023 - momentum: 0.000000 2023-10-15 16:49:01,447 epoch 4 - iter 780/2606 - loss 0.06892697 - time (sec): 57.15 - samples/sec: 1942.37 - lr: 0.000022 - momentum: 0.000000 2023-10-15 16:49:20,367 epoch 4 - iter 1040/2606 - loss 0.06689883 - time (sec): 76.07 - samples/sec: 1935.40 - lr: 0.000022 - momentum: 0.000000 2023-10-15 16:49:39,006 epoch 4 - iter 1300/2606 - loss 0.06644882 - time (sec): 94.71 - samples/sec: 1937.06 - lr: 0.000022 - momentum: 0.000000 2023-10-15 16:49:58,334 epoch 4 - iter 1560/2606 - loss 0.06701638 - time (sec): 114.03 - samples/sec: 1942.46 - lr: 0.000021 - momentum: 0.000000 2023-10-15 16:50:17,146 epoch 4 - iter 1820/2606 - loss 0.06820439 - time (sec): 132.85 - samples/sec: 1947.78 - lr: 0.000021 - momentum: 0.000000 2023-10-15 16:50:36,366 epoch 4 - iter 2080/2606 - loss 0.06885462 - time (sec): 152.07 - samples/sec: 1934.72 - lr: 0.000021 - momentum: 0.000000 2023-10-15 16:50:54,739 epoch 4 - iter 2340/2606 - loss 0.06876316 - time (sec): 170.44 - samples/sec: 1931.29 - lr: 0.000020 - momentum: 0.000000 2023-10-15 16:51:13,995 epoch 4 - iter 2600/2606 - loss 0.06837534 - time (sec): 189.70 - samples/sec: 1933.23 - lr: 0.000020 - momentum: 0.000000 2023-10-15 16:51:14,350 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:51:14,350 EPOCH 4 done: loss 0.0687 - lr: 0.000020 2023-10-15 16:51:22,591 DEV : loss 0.30411383509635925 - f1-score (micro avg) 0.3834 2023-10-15 16:51:22,619 saving best model 2023-10-15 16:51:23,272 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:51:44,016 epoch 5 - iter 260/2606 - loss 0.05158986 - time (sec): 20.74 - samples/sec: 1921.40 - lr: 0.000020 - momentum: 0.000000 2023-10-15 16:52:02,369 epoch 5 - iter 520/2606 - loss 0.05047142 - time (sec): 39.10 - samples/sec: 1943.13 - lr: 0.000019 - momentum: 0.000000 2023-10-15 16:52:20,621 epoch 5 - iter 780/2606 - loss 0.05164843 - time (sec): 57.35 - samples/sec: 1939.10 - lr: 0.000019 - momentum: 0.000000 2023-10-15 16:52:40,125 epoch 5 - iter 1040/2606 - loss 0.04863786 - time (sec): 76.85 - samples/sec: 1950.69 - lr: 0.000019 - momentum: 0.000000 2023-10-15 16:52:59,496 epoch 5 - iter 1300/2606 - loss 0.05114388 - time (sec): 96.22 - samples/sec: 1942.27 - lr: 0.000018 - momentum: 0.000000 2023-10-15 16:53:18,027 epoch 5 - iter 1560/2606 - loss 0.05030163 - time (sec): 114.75 - samples/sec: 1934.20 - lr: 0.000018 - momentum: 0.000000 2023-10-15 16:53:36,276 epoch 5 - iter 1820/2606 - loss 0.04918969 - time (sec): 133.00 - samples/sec: 1927.14 - lr: 0.000018 - momentum: 0.000000 2023-10-15 16:53:55,482 epoch 5 - iter 2080/2606 - loss 0.04969722 - time (sec): 152.21 - samples/sec: 1928.53 - lr: 0.000017 - momentum: 0.000000 2023-10-15 16:54:13,930 epoch 5 - iter 2340/2606 - loss 0.04985016 - time (sec): 170.66 - samples/sec: 1933.53 - lr: 0.000017 - momentum: 0.000000 2023-10-15 16:54:32,872 epoch 5 - iter 2600/2606 - loss 0.05013012 - time (sec): 189.60 - samples/sec: 1933.97 - lr: 0.000017 - momentum: 0.000000 2023-10-15 16:54:33,288 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:54:33,288 EPOCH 5 done: loss 0.0501 - lr: 0.000017 2023-10-15 16:54:41,515 DEV : loss 0.3421619236469269 - f1-score (micro avg) 0.3831 2023-10-15 16:54:41,542 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:55:00,271 epoch 6 - iter 260/2606 - loss 0.02836933 - time (sec): 18.73 - samples/sec: 1912.47 - lr: 0.000016 - momentum: 0.000000 2023-10-15 16:55:19,461 epoch 6 - iter 520/2606 - loss 0.03322309 - time (sec): 37.92 - samples/sec: 1958.69 - lr: 0.000016 - momentum: 0.000000 2023-10-15 16:55:39,110 epoch 6 - iter 780/2606 - loss 0.03417720 - time (sec): 57.57 - samples/sec: 1947.43 - lr: 0.000016 - momentum: 0.000000 2023-10-15 16:55:58,123 epoch 6 - iter 1040/2606 - loss 0.03561005 - time (sec): 76.58 - samples/sec: 1936.64 - lr: 0.000015 - momentum: 0.000000 2023-10-15 16:56:15,638 epoch 6 - iter 1300/2606 - loss 0.03797069 - time (sec): 94.09 - samples/sec: 1934.24 - lr: 0.000015 - momentum: 0.000000 2023-10-15 16:56:35,345 epoch 6 - iter 1560/2606 - loss 0.03933476 - time (sec): 113.80 - samples/sec: 1940.11 - lr: 0.000015 - momentum: 0.000000 2023-10-15 16:56:53,573 epoch 6 - iter 1820/2606 - loss 0.03849459 - time (sec): 132.03 - samples/sec: 1940.99 - lr: 0.000014 - momentum: 0.000000 2023-10-15 16:57:14,002 epoch 6 - iter 2080/2606 - loss 0.03816553 - time (sec): 152.46 - samples/sec: 1927.13 - lr: 0.000014 - momentum: 0.000000 2023-10-15 16:57:32,779 epoch 6 - iter 2340/2606 - loss 0.03680409 - time (sec): 171.24 - samples/sec: 1928.56 - lr: 0.000014 - momentum: 0.000000 2023-10-15 16:57:51,496 epoch 6 - iter 2600/2606 - loss 0.03661753 - time (sec): 189.95 - samples/sec: 1928.30 - lr: 0.000013 - momentum: 0.000000 2023-10-15 16:57:52,077 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:57:52,077 EPOCH 6 done: loss 0.0367 - lr: 0.000013 2023-10-15 16:58:00,492 DEV : loss 0.40107467770576477 - f1-score (micro avg) 0.3727 2023-10-15 16:58:00,522 ---------------------------------------------------------------------------------------------------- 2023-10-15 16:58:18,805 epoch 7 - iter 260/2606 - loss 0.02807878 - time (sec): 18.28 - samples/sec: 1847.43 - lr: 0.000013 - momentum: 0.000000 2023-10-15 16:58:37,987 epoch 7 - iter 520/2606 - loss 0.02940039 - time (sec): 37.46 - samples/sec: 1892.59 - lr: 0.000013 - momentum: 0.000000 2023-10-15 16:58:56,264 epoch 7 - iter 780/2606 - loss 0.02822614 - time (sec): 55.74 - samples/sec: 1876.29 - lr: 0.000012 - momentum: 0.000000 2023-10-15 16:59:15,716 epoch 7 - iter 1040/2606 - loss 0.02631228 - time (sec): 75.19 - samples/sec: 1894.14 - lr: 0.000012 - momentum: 0.000000 2023-10-15 16:59:35,276 epoch 7 - iter 1300/2606 - loss 0.02795012 - time (sec): 94.75 - samples/sec: 1905.97 - lr: 0.000012 - momentum: 0.000000 2023-10-15 16:59:53,587 epoch 7 - iter 1560/2606 - loss 0.02688673 - time (sec): 113.06 - samples/sec: 1916.11 - lr: 0.000011 - momentum: 0.000000 2023-10-15 17:00:12,419 epoch 7 - iter 1820/2606 - loss 0.02730602 - time (sec): 131.90 - samples/sec: 1933.54 - lr: 0.000011 - momentum: 0.000000 2023-10-15 17:00:31,363 epoch 7 - iter 2080/2606 - loss 0.02716500 - time (sec): 150.84 - samples/sec: 1937.16 - lr: 0.000011 - momentum: 0.000000 2023-10-15 17:00:50,648 epoch 7 - iter 2340/2606 - loss 0.02641485 - time (sec): 170.12 - samples/sec: 1931.45 - lr: 0.000010 - momentum: 0.000000 2023-10-15 17:01:10,567 epoch 7 - iter 2600/2606 - loss 0.02685152 - time (sec): 190.04 - samples/sec: 1929.68 - lr: 0.000010 - momentum: 0.000000 2023-10-15 17:01:11,036 ---------------------------------------------------------------------------------------------------- 2023-10-15 17:01:11,036 EPOCH 7 done: loss 0.0268 - lr: 0.000010 2023-10-15 17:01:19,242 DEV : loss 0.36185500025749207 - f1-score (micro avg) 0.4043 2023-10-15 17:01:19,273 saving best model 2023-10-15 17:01:19,919 ---------------------------------------------------------------------------------------------------- 2023-10-15 17:01:39,331 epoch 8 - iter 260/2606 - loss 0.01643423 - time (sec): 19.41 - samples/sec: 2026.45 - lr: 0.000010 - momentum: 0.000000 2023-10-15 17:01:59,051 epoch 8 - iter 520/2606 - loss 0.01851950 - time (sec): 39.13 - samples/sec: 1976.19 - lr: 0.000009 - momentum: 0.000000 2023-10-15 17:02:18,480 epoch 8 - iter 780/2606 - loss 0.01850720 - time (sec): 58.56 - samples/sec: 1959.55 - lr: 0.000009 - momentum: 0.000000 2023-10-15 17:02:36,845 epoch 8 - iter 1040/2606 - loss 0.02011290 - time (sec): 76.92 - samples/sec: 1946.80 - lr: 0.000009 - momentum: 0.000000 2023-10-15 17:02:55,283 epoch 8 - iter 1300/2606 - loss 0.01977515 - time (sec): 95.36 - samples/sec: 1940.14 - lr: 0.000008 - momentum: 0.000000 2023-10-15 17:03:14,146 epoch 8 - iter 1560/2606 - loss 0.01909661 - time (sec): 114.22 - samples/sec: 1935.70 - lr: 0.000008 - momentum: 0.000000 2023-10-15 17:03:33,519 epoch 8 - iter 1820/2606 - loss 0.01939229 - time (sec): 133.60 - samples/sec: 1937.25 - lr: 0.000008 - momentum: 0.000000 2023-10-15 17:03:51,713 epoch 8 - iter 2080/2606 - loss 0.02001143 - time (sec): 151.79 - samples/sec: 1932.90 - lr: 0.000007 - momentum: 0.000000 2023-10-15 17:04:10,894 epoch 8 - iter 2340/2606 - loss 0.01961925 - time (sec): 170.97 - samples/sec: 1929.77 - lr: 0.000007 - momentum: 0.000000 2023-10-15 17:04:29,158 epoch 8 - iter 2600/2606 - loss 0.02003412 - time (sec): 189.23 - samples/sec: 1934.90 - lr: 0.000007 - momentum: 0.000000 2023-10-15 17:04:29,709 ---------------------------------------------------------------------------------------------------- 2023-10-15 17:04:29,709 EPOCH 8 done: loss 0.0200 - lr: 0.000007 2023-10-15 17:04:38,859 DEV : loss 0.4060736298561096 - f1-score (micro avg) 0.4087 2023-10-15 17:04:38,892 saving best model 2023-10-15 17:04:39,411 ---------------------------------------------------------------------------------------------------- 2023-10-15 17:04:58,303 epoch 9 - iter 260/2606 - loss 0.01725916 - time (sec): 18.89 - samples/sec: 1881.58 - lr: 0.000006 - momentum: 0.000000 2023-10-15 17:05:17,100 epoch 9 - iter 520/2606 - loss 0.01816425 - time (sec): 37.68 - samples/sec: 1897.84 - lr: 0.000006 - momentum: 0.000000 2023-10-15 17:05:35,858 epoch 9 - iter 780/2606 - loss 0.01585953 - time (sec): 56.44 - samples/sec: 1915.98 - lr: 0.000006 - momentum: 0.000000 2023-10-15 17:05:54,651 epoch 9 - iter 1040/2606 - loss 0.01513412 - time (sec): 75.24 - samples/sec: 1919.13 - lr: 0.000005 - momentum: 0.000000 2023-10-15 17:06:13,353 epoch 9 - iter 1300/2606 - loss 0.01568114 - time (sec): 93.94 - samples/sec: 1914.89 - lr: 0.000005 - momentum: 0.000000 2023-10-15 17:06:33,351 epoch 9 - iter 1560/2606 - loss 0.01491052 - time (sec): 113.94 - samples/sec: 1920.33 - lr: 0.000005 - momentum: 0.000000 2023-10-15 17:06:52,085 epoch 9 - iter 1820/2606 - loss 0.01453602 - time (sec): 132.67 - samples/sec: 1923.52 - lr: 0.000004 - momentum: 0.000000 2023-10-15 17:07:10,935 epoch 9 - iter 2080/2606 - loss 0.01454802 - time (sec): 151.52 - samples/sec: 1919.01 - lr: 0.000004 - momentum: 0.000000 2023-10-15 17:07:30,661 epoch 9 - iter 2340/2606 - loss 0.01402508 - time (sec): 171.25 - samples/sec: 1920.68 - lr: 0.000004 - momentum: 0.000000 2023-10-15 17:07:50,213 epoch 9 - iter 2600/2606 - loss 0.01358373 - time (sec): 190.80 - samples/sec: 1921.95 - lr: 0.000003 - momentum: 0.000000 2023-10-15 17:07:50,566 ---------------------------------------------------------------------------------------------------- 2023-10-15 17:07:50,567 EPOCH 9 done: loss 0.0136 - lr: 0.000003 2023-10-15 17:07:59,619 DEV : loss 0.5201111435890198 - f1-score (micro avg) 0.3708 2023-10-15 17:07:59,648 ---------------------------------------------------------------------------------------------------- 2023-10-15 17:08:18,216 epoch 10 - iter 260/2606 - loss 0.00949472 - time (sec): 18.57 - samples/sec: 1933.16 - lr: 0.000003 - momentum: 0.000000 2023-10-15 17:08:37,149 epoch 10 - iter 520/2606 - loss 0.00956979 - time (sec): 37.50 - samples/sec: 1952.46 - lr: 0.000003 - momentum: 0.000000 2023-10-15 17:08:55,678 epoch 10 - iter 780/2606 - loss 0.00953648 - time (sec): 56.03 - samples/sec: 1970.66 - lr: 0.000002 - momentum: 0.000000 2023-10-15 17:09:14,776 epoch 10 - iter 1040/2606 - loss 0.00918127 - time (sec): 75.13 - samples/sec: 1951.47 - lr: 0.000002 - momentum: 0.000000 2023-10-15 17:09:34,593 epoch 10 - iter 1300/2606 - loss 0.00921523 - time (sec): 94.94 - samples/sec: 1953.32 - lr: 0.000002 - momentum: 0.000000 2023-10-15 17:09:53,789 epoch 10 - iter 1560/2606 - loss 0.00881008 - time (sec): 114.14 - samples/sec: 1942.44 - lr: 0.000001 - momentum: 0.000000 2023-10-15 17:10:12,660 epoch 10 - iter 1820/2606 - loss 0.00913303 - time (sec): 133.01 - samples/sec: 1941.32 - lr: 0.000001 - momentum: 0.000000 2023-10-15 17:10:31,001 epoch 10 - iter 2080/2606 - loss 0.00971053 - time (sec): 151.35 - samples/sec: 1938.83 - lr: 0.000001 - momentum: 0.000000 2023-10-15 17:10:49,857 epoch 10 - iter 2340/2606 - loss 0.00969515 - time (sec): 170.21 - samples/sec: 1933.79 - lr: 0.000000 - momentum: 0.000000 2023-10-15 17:11:09,481 epoch 10 - iter 2600/2606 - loss 0.00962657 - time (sec): 189.83 - samples/sec: 1932.12 - lr: 0.000000 - momentum: 0.000000 2023-10-15 17:11:09,863 ---------------------------------------------------------------------------------------------------- 2023-10-15 17:11:09,863 EPOCH 10 done: loss 0.0096 - lr: 0.000000 2023-10-15 17:11:18,876 DEV : loss 0.4862280786037445 - f1-score (micro avg) 0.3916 2023-10-15 17:11:19,325 ---------------------------------------------------------------------------------------------------- 2023-10-15 17:11:19,326 Loading model from best epoch ... 2023-10-15 17:11:20,868 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-15 17:11:37,464 Results: - F-score (micro) 0.464 - F-score (macro) 0.3025 - Accuracy 0.3068 By class: precision recall f1-score support LOC 0.4868 0.5783 0.5286 1214 PER 0.4134 0.5050 0.4546 808 ORG 0.2706 0.1955 0.2270 353 HumanProd 0.0000 0.0000 0.0000 15 micro avg 0.4380 0.4933 0.4640 2390 macro avg 0.2927 0.3197 0.3025 2390 weighted avg 0.4270 0.4933 0.4557 2390 2023-10-15 17:11:37,464 ----------------------------------------------------------------------------------------------------