File size: 24,105 Bytes
794b77b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
2023-10-15 17:46:21,046 ----------------------------------------------------------------------------------------------------
2023-10-15 17:46:21,047 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(32001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-11): 12 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=17, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-15 17:46:21,047 ----------------------------------------------------------------------------------------------------
2023-10-15 17:46:21,047 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences
 - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator
2023-10-15 17:46:21,047 ----------------------------------------------------------------------------------------------------
2023-10-15 17:46:21,047 Train:  20847 sentences
2023-10-15 17:46:21,047         (train_with_dev=False, train_with_test=False)
2023-10-15 17:46:21,048 ----------------------------------------------------------------------------------------------------
2023-10-15 17:46:21,048 Training Params:
2023-10-15 17:46:21,048  - learning_rate: "3e-05" 
2023-10-15 17:46:21,048  - mini_batch_size: "4"
2023-10-15 17:46:21,048  - max_epochs: "10"
2023-10-15 17:46:21,048  - shuffle: "True"
2023-10-15 17:46:21,048 ----------------------------------------------------------------------------------------------------
2023-10-15 17:46:21,048 Plugins:
2023-10-15 17:46:21,048  - LinearScheduler | warmup_fraction: '0.1'
2023-10-15 17:46:21,048 ----------------------------------------------------------------------------------------------------
2023-10-15 17:46:21,048 Final evaluation on model from best epoch (best-model.pt)
2023-10-15 17:46:21,048  - metric: "('micro avg', 'f1-score')"
2023-10-15 17:46:21,048 ----------------------------------------------------------------------------------------------------
2023-10-15 17:46:21,048 Computation:
2023-10-15 17:46:21,048  - compute on device: cuda:0
2023-10-15 17:46:21,048  - embedding storage: none
2023-10-15 17:46:21,048 ----------------------------------------------------------------------------------------------------
2023-10-15 17:46:21,048 Model training base path: "hmbench-newseye/de-dbmdz/bert-base-historic-multilingual-cased-bs4-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-3"
2023-10-15 17:46:21,048 ----------------------------------------------------------------------------------------------------
2023-10-15 17:46:21,048 ----------------------------------------------------------------------------------------------------
2023-10-15 17:46:46,375 epoch 1 - iter 521/5212 - loss 1.50097394 - time (sec): 25.33 - samples/sec: 1383.16 - lr: 0.000003 - momentum: 0.000000
2023-10-15 17:47:11,339 epoch 1 - iter 1042/5212 - loss 1.01650844 - time (sec): 50.29 - samples/sec: 1411.09 - lr: 0.000006 - momentum: 0.000000
2023-10-15 17:47:37,066 epoch 1 - iter 1563/5212 - loss 0.77214666 - time (sec): 76.02 - samples/sec: 1423.69 - lr: 0.000009 - momentum: 0.000000
2023-10-15 17:48:02,847 epoch 1 - iter 2084/5212 - loss 0.64040683 - time (sec): 101.80 - samples/sec: 1431.64 - lr: 0.000012 - momentum: 0.000000
2023-10-15 17:48:28,551 epoch 1 - iter 2605/5212 - loss 0.56192130 - time (sec): 127.50 - samples/sec: 1434.16 - lr: 0.000015 - momentum: 0.000000
2023-10-15 17:48:54,484 epoch 1 - iter 3126/5212 - loss 0.49948839 - time (sec): 153.44 - samples/sec: 1449.35 - lr: 0.000018 - momentum: 0.000000
2023-10-15 17:49:20,094 epoch 1 - iter 3647/5212 - loss 0.46113087 - time (sec): 179.04 - samples/sec: 1446.19 - lr: 0.000021 - momentum: 0.000000
2023-10-15 17:49:45,443 epoch 1 - iter 4168/5212 - loss 0.43061045 - time (sec): 204.39 - samples/sec: 1446.65 - lr: 0.000024 - momentum: 0.000000
2023-10-15 17:50:10,290 epoch 1 - iter 4689/5212 - loss 0.41123436 - time (sec): 229.24 - samples/sec: 1435.64 - lr: 0.000027 - momentum: 0.000000
2023-10-15 17:50:36,050 epoch 1 - iter 5210/5212 - loss 0.38867196 - time (sec): 255.00 - samples/sec: 1440.57 - lr: 0.000030 - momentum: 0.000000
2023-10-15 17:50:36,136 ----------------------------------------------------------------------------------------------------
2023-10-15 17:50:36,137 EPOCH 1 done: loss 0.3887 - lr: 0.000030
2023-10-15 17:50:42,081 DEV : loss 0.20117585361003876 - f1-score (micro avg)  0.2861
2023-10-15 17:50:42,110 saving best model
2023-10-15 17:50:42,473 ----------------------------------------------------------------------------------------------------
2023-10-15 17:51:07,318 epoch 2 - iter 521/5212 - loss 0.22240788 - time (sec): 24.84 - samples/sec: 1352.80 - lr: 0.000030 - momentum: 0.000000
2023-10-15 17:51:32,709 epoch 2 - iter 1042/5212 - loss 0.20503019 - time (sec): 50.24 - samples/sec: 1409.19 - lr: 0.000029 - momentum: 0.000000
2023-10-15 17:51:57,674 epoch 2 - iter 1563/5212 - loss 0.19541278 - time (sec): 75.20 - samples/sec: 1411.83 - lr: 0.000029 - momentum: 0.000000
2023-10-15 17:52:23,171 epoch 2 - iter 2084/5212 - loss 0.18582207 - time (sec): 100.70 - samples/sec: 1423.80 - lr: 0.000029 - momentum: 0.000000
2023-10-15 17:52:48,033 epoch 2 - iter 2605/5212 - loss 0.18632288 - time (sec): 125.56 - samples/sec: 1427.21 - lr: 0.000028 - momentum: 0.000000
2023-10-15 17:53:13,971 epoch 2 - iter 3126/5212 - loss 0.18323477 - time (sec): 151.50 - samples/sec: 1443.16 - lr: 0.000028 - momentum: 0.000000
2023-10-15 17:53:39,204 epoch 2 - iter 3647/5212 - loss 0.18006233 - time (sec): 176.73 - samples/sec: 1439.74 - lr: 0.000028 - momentum: 0.000000
2023-10-15 17:54:04,624 epoch 2 - iter 4168/5212 - loss 0.17990351 - time (sec): 202.15 - samples/sec: 1439.47 - lr: 0.000027 - momentum: 0.000000
2023-10-15 17:54:30,039 epoch 2 - iter 4689/5212 - loss 0.17515115 - time (sec): 227.56 - samples/sec: 1442.75 - lr: 0.000027 - momentum: 0.000000
2023-10-15 17:54:55,742 epoch 2 - iter 5210/5212 - loss 0.17338933 - time (sec): 253.27 - samples/sec: 1448.53 - lr: 0.000027 - momentum: 0.000000
2023-10-15 17:54:55,905 ----------------------------------------------------------------------------------------------------
2023-10-15 17:54:55,905 EPOCH 2 done: loss 0.1735 - lr: 0.000027
2023-10-15 17:55:04,829 DEV : loss 0.17898787558078766 - f1-score (micro avg)  0.3658
2023-10-15 17:55:04,856 saving best model
2023-10-15 17:55:05,299 ----------------------------------------------------------------------------------------------------
2023-10-15 17:55:30,120 epoch 3 - iter 521/5212 - loss 0.12227339 - time (sec): 24.82 - samples/sec: 1351.59 - lr: 0.000026 - momentum: 0.000000
2023-10-15 17:55:55,409 epoch 3 - iter 1042/5212 - loss 0.12395854 - time (sec): 50.11 - samples/sec: 1373.02 - lr: 0.000026 - momentum: 0.000000
2023-10-15 17:56:21,481 epoch 3 - iter 1563/5212 - loss 0.11709392 - time (sec): 76.18 - samples/sec: 1451.09 - lr: 0.000026 - momentum: 0.000000
2023-10-15 17:56:47,235 epoch 3 - iter 2084/5212 - loss 0.11296261 - time (sec): 101.93 - samples/sec: 1450.16 - lr: 0.000025 - momentum: 0.000000
2023-10-15 17:57:12,531 epoch 3 - iter 2605/5212 - loss 0.11412983 - time (sec): 127.23 - samples/sec: 1452.46 - lr: 0.000025 - momentum: 0.000000
2023-10-15 17:57:36,931 epoch 3 - iter 3126/5212 - loss 0.11505627 - time (sec): 151.63 - samples/sec: 1462.07 - lr: 0.000025 - momentum: 0.000000
2023-10-15 17:58:01,267 epoch 3 - iter 3647/5212 - loss 0.11553900 - time (sec): 175.97 - samples/sec: 1465.80 - lr: 0.000024 - momentum: 0.000000
2023-10-15 17:58:26,228 epoch 3 - iter 4168/5212 - loss 0.11759207 - time (sec): 200.93 - samples/sec: 1457.48 - lr: 0.000024 - momentum: 0.000000
2023-10-15 17:58:51,153 epoch 3 - iter 4689/5212 - loss 0.11866732 - time (sec): 225.85 - samples/sec: 1457.93 - lr: 0.000024 - momentum: 0.000000
2023-10-15 17:59:16,200 epoch 3 - iter 5210/5212 - loss 0.11854949 - time (sec): 250.90 - samples/sec: 1464.12 - lr: 0.000023 - momentum: 0.000000
2023-10-15 17:59:16,294 ----------------------------------------------------------------------------------------------------
2023-10-15 17:59:16,294 EPOCH 3 done: loss 0.1185 - lr: 0.000023
2023-10-15 17:59:25,325 DEV : loss 0.24826383590698242 - f1-score (micro avg)  0.3206
2023-10-15 17:59:25,351 ----------------------------------------------------------------------------------------------------
2023-10-15 17:59:50,986 epoch 4 - iter 521/5212 - loss 0.07086836 - time (sec): 25.63 - samples/sec: 1446.52 - lr: 0.000023 - momentum: 0.000000
2023-10-15 18:00:16,603 epoch 4 - iter 1042/5212 - loss 0.07461997 - time (sec): 51.25 - samples/sec: 1467.28 - lr: 0.000023 - momentum: 0.000000
2023-10-15 18:00:42,316 epoch 4 - iter 1563/5212 - loss 0.07589647 - time (sec): 76.96 - samples/sec: 1444.37 - lr: 0.000022 - momentum: 0.000000
2023-10-15 18:01:08,183 epoch 4 - iter 2084/5212 - loss 0.07950376 - time (sec): 102.83 - samples/sec: 1433.87 - lr: 0.000022 - momentum: 0.000000
2023-10-15 18:01:33,328 epoch 4 - iter 2605/5212 - loss 0.07820170 - time (sec): 127.98 - samples/sec: 1437.64 - lr: 0.000022 - momentum: 0.000000
2023-10-15 18:01:59,013 epoch 4 - iter 3126/5212 - loss 0.08102218 - time (sec): 153.66 - samples/sec: 1442.95 - lr: 0.000021 - momentum: 0.000000
2023-10-15 18:02:24,785 epoch 4 - iter 3647/5212 - loss 0.08157332 - time (sec): 179.43 - samples/sec: 1446.50 - lr: 0.000021 - momentum: 0.000000
2023-10-15 18:02:50,143 epoch 4 - iter 4168/5212 - loss 0.08239732 - time (sec): 204.79 - samples/sec: 1440.24 - lr: 0.000021 - momentum: 0.000000
2023-10-15 18:03:15,106 epoch 4 - iter 4689/5212 - loss 0.08200821 - time (sec): 229.75 - samples/sec: 1437.08 - lr: 0.000020 - momentum: 0.000000
2023-10-15 18:03:40,271 epoch 4 - iter 5210/5212 - loss 0.08251544 - time (sec): 254.92 - samples/sec: 1441.21 - lr: 0.000020 - momentum: 0.000000
2023-10-15 18:03:40,369 ----------------------------------------------------------------------------------------------------
2023-10-15 18:03:40,369 EPOCH 4 done: loss 0.0825 - lr: 0.000020
2023-10-15 18:03:48,838 DEV : loss 0.35853323340415955 - f1-score (micro avg)  0.3579
2023-10-15 18:03:48,867 ----------------------------------------------------------------------------------------------------
2023-10-15 18:04:15,944 epoch 5 - iter 521/5212 - loss 0.06074459 - time (sec): 27.07 - samples/sec: 1474.06 - lr: 0.000020 - momentum: 0.000000
2023-10-15 18:04:41,310 epoch 5 - iter 1042/5212 - loss 0.05750324 - time (sec): 52.44 - samples/sec: 1450.08 - lr: 0.000019 - momentum: 0.000000
2023-10-15 18:05:06,545 epoch 5 - iter 1563/5212 - loss 0.06019844 - time (sec): 77.68 - samples/sec: 1440.58 - lr: 0.000019 - momentum: 0.000000
2023-10-15 18:05:31,865 epoch 5 - iter 2084/5212 - loss 0.05797686 - time (sec): 103.00 - samples/sec: 1457.52 - lr: 0.000019 - momentum: 0.000000
2023-10-15 18:05:56,701 epoch 5 - iter 2605/5212 - loss 0.05779824 - time (sec): 127.83 - samples/sec: 1463.83 - lr: 0.000018 - momentum: 0.000000
2023-10-15 18:06:21,368 epoch 5 - iter 3126/5212 - loss 0.05796520 - time (sec): 152.50 - samples/sec: 1458.09 - lr: 0.000018 - momentum: 0.000000
2023-10-15 18:06:46,164 epoch 5 - iter 3647/5212 - loss 0.05749706 - time (sec): 177.30 - samples/sec: 1447.84 - lr: 0.000018 - momentum: 0.000000
2023-10-15 18:07:11,414 epoch 5 - iter 4168/5212 - loss 0.05748009 - time (sec): 202.55 - samples/sec: 1451.08 - lr: 0.000017 - momentum: 0.000000
2023-10-15 18:07:36,488 epoch 5 - iter 4689/5212 - loss 0.05835980 - time (sec): 227.62 - samples/sec: 1453.58 - lr: 0.000017 - momentum: 0.000000
2023-10-15 18:08:02,017 epoch 5 - iter 5210/5212 - loss 0.05917565 - time (sec): 253.15 - samples/sec: 1451.21 - lr: 0.000017 - momentum: 0.000000
2023-10-15 18:08:02,105 ----------------------------------------------------------------------------------------------------
2023-10-15 18:08:02,106 EPOCH 5 done: loss 0.0592 - lr: 0.000017
2023-10-15 18:08:10,370 DEV : loss 0.2954880893230438 - f1-score (micro avg)  0.3915
2023-10-15 18:08:10,397 saving best model
2023-10-15 18:08:10,852 ----------------------------------------------------------------------------------------------------
2023-10-15 18:08:36,370 epoch 6 - iter 521/5212 - loss 0.03566624 - time (sec): 25.51 - samples/sec: 1407.17 - lr: 0.000016 - momentum: 0.000000
2023-10-15 18:09:01,776 epoch 6 - iter 1042/5212 - loss 0.03735819 - time (sec): 50.92 - samples/sec: 1459.40 - lr: 0.000016 - momentum: 0.000000
2023-10-15 18:09:27,672 epoch 6 - iter 1563/5212 - loss 0.04010312 - time (sec): 76.82 - samples/sec: 1461.32 - lr: 0.000016 - momentum: 0.000000
2023-10-15 18:09:53,920 epoch 6 - iter 2084/5212 - loss 0.04272252 - time (sec): 103.06 - samples/sec: 1440.65 - lr: 0.000015 - momentum: 0.000000
2023-10-15 18:10:18,891 epoch 6 - iter 2605/5212 - loss 0.04658724 - time (sec): 128.04 - samples/sec: 1423.31 - lr: 0.000015 - momentum: 0.000000
2023-10-15 18:10:44,769 epoch 6 - iter 3126/5212 - loss 0.04608168 - time (sec): 153.91 - samples/sec: 1436.73 - lr: 0.000015 - momentum: 0.000000
2023-10-15 18:11:09,948 epoch 6 - iter 3647/5212 - loss 0.04595678 - time (sec): 179.09 - samples/sec: 1434.36 - lr: 0.000014 - momentum: 0.000000
2023-10-15 18:11:35,711 epoch 6 - iter 4168/5212 - loss 0.04599344 - time (sec): 204.85 - samples/sec: 1436.63 - lr: 0.000014 - momentum: 0.000000
2023-10-15 18:12:00,745 epoch 6 - iter 4689/5212 - loss 0.04455311 - time (sec): 229.89 - samples/sec: 1439.85 - lr: 0.000014 - momentum: 0.000000
2023-10-15 18:12:26,059 epoch 6 - iter 5210/5212 - loss 0.04410825 - time (sec): 255.20 - samples/sec: 1438.27 - lr: 0.000013 - momentum: 0.000000
2023-10-15 18:12:26,260 ----------------------------------------------------------------------------------------------------
2023-10-15 18:12:26,260 EPOCH 6 done: loss 0.0441 - lr: 0.000013
2023-10-15 18:12:34,571 DEV : loss 0.39353951811790466 - f1-score (micro avg)  0.3728
2023-10-15 18:12:34,601 ----------------------------------------------------------------------------------------------------
2023-10-15 18:12:59,435 epoch 7 - iter 521/5212 - loss 0.03179235 - time (sec): 24.83 - samples/sec: 1364.97 - lr: 0.000013 - momentum: 0.000000
2023-10-15 18:13:24,802 epoch 7 - iter 1042/5212 - loss 0.02701866 - time (sec): 50.20 - samples/sec: 1413.92 - lr: 0.000013 - momentum: 0.000000
2023-10-15 18:13:49,752 epoch 7 - iter 1563/5212 - loss 0.02758616 - time (sec): 75.15 - samples/sec: 1397.11 - lr: 0.000012 - momentum: 0.000000
2023-10-15 18:14:15,112 epoch 7 - iter 2084/5212 - loss 0.02597876 - time (sec): 100.51 - samples/sec: 1423.38 - lr: 0.000012 - momentum: 0.000000
2023-10-15 18:14:40,889 epoch 7 - iter 2605/5212 - loss 0.03022671 - time (sec): 126.29 - samples/sec: 1432.09 - lr: 0.000012 - momentum: 0.000000
2023-10-15 18:15:06,162 epoch 7 - iter 3126/5212 - loss 0.03044416 - time (sec): 151.56 - samples/sec: 1431.88 - lr: 0.000011 - momentum: 0.000000
2023-10-15 18:15:31,480 epoch 7 - iter 3647/5212 - loss 0.03128317 - time (sec): 176.88 - samples/sec: 1442.88 - lr: 0.000011 - momentum: 0.000000
2023-10-15 18:15:57,689 epoch 7 - iter 4168/5212 - loss 0.03231927 - time (sec): 203.09 - samples/sec: 1441.69 - lr: 0.000011 - momentum: 0.000000
2023-10-15 18:16:24,016 epoch 7 - iter 4689/5212 - loss 0.03123865 - time (sec): 229.41 - samples/sec: 1435.67 - lr: 0.000010 - momentum: 0.000000
2023-10-15 18:16:49,654 epoch 7 - iter 5210/5212 - loss 0.03238048 - time (sec): 255.05 - samples/sec: 1440.47 - lr: 0.000010 - momentum: 0.000000
2023-10-15 18:16:49,745 ----------------------------------------------------------------------------------------------------
2023-10-15 18:16:49,745 EPOCH 7 done: loss 0.0324 - lr: 0.000010
2023-10-15 18:16:58,188 DEV : loss 0.4698057472705841 - f1-score (micro avg)  0.3445
2023-10-15 18:16:58,221 ----------------------------------------------------------------------------------------------------
2023-10-15 18:17:24,331 epoch 8 - iter 521/5212 - loss 0.02118882 - time (sec): 26.11 - samples/sec: 1507.33 - lr: 0.000010 - momentum: 0.000000
2023-10-15 18:17:50,035 epoch 8 - iter 1042/5212 - loss 0.02174828 - time (sec): 51.81 - samples/sec: 1501.03 - lr: 0.000009 - momentum: 0.000000
2023-10-15 18:18:15,674 epoch 8 - iter 1563/5212 - loss 0.02240042 - time (sec): 77.45 - samples/sec: 1484.23 - lr: 0.000009 - momentum: 0.000000
2023-10-15 18:18:40,960 epoch 8 - iter 2084/5212 - loss 0.02307800 - time (sec): 102.74 - samples/sec: 1458.93 - lr: 0.000009 - momentum: 0.000000
2023-10-15 18:19:06,094 epoch 8 - iter 2605/5212 - loss 0.02270835 - time (sec): 127.87 - samples/sec: 1448.41 - lr: 0.000008 - momentum: 0.000000
2023-10-15 18:19:31,710 epoch 8 - iter 3126/5212 - loss 0.02207908 - time (sec): 153.49 - samples/sec: 1443.14 - lr: 0.000008 - momentum: 0.000000
2023-10-15 18:19:57,514 epoch 8 - iter 3647/5212 - loss 0.02166312 - time (sec): 179.29 - samples/sec: 1447.94 - lr: 0.000008 - momentum: 0.000000
2023-10-15 18:20:22,251 epoch 8 - iter 4168/5212 - loss 0.02233639 - time (sec): 204.03 - samples/sec: 1441.81 - lr: 0.000007 - momentum: 0.000000
2023-10-15 18:20:47,299 epoch 8 - iter 4689/5212 - loss 0.02246766 - time (sec): 229.08 - samples/sec: 1441.54 - lr: 0.000007 - momentum: 0.000000
2023-10-15 18:21:11,946 epoch 8 - iter 5210/5212 - loss 0.02273107 - time (sec): 253.72 - samples/sec: 1446.48 - lr: 0.000007 - momentum: 0.000000
2023-10-15 18:21:12,056 ----------------------------------------------------------------------------------------------------
2023-10-15 18:21:12,056 EPOCH 8 done: loss 0.0227 - lr: 0.000007
2023-10-15 18:21:21,123 DEV : loss 0.48737385869026184 - f1-score (micro avg)  0.3614
2023-10-15 18:21:21,155 ----------------------------------------------------------------------------------------------------
2023-10-15 18:21:45,925 epoch 9 - iter 521/5212 - loss 0.01657988 - time (sec): 24.77 - samples/sec: 1436.79 - lr: 0.000006 - momentum: 0.000000
2023-10-15 18:22:11,071 epoch 9 - iter 1042/5212 - loss 0.01825306 - time (sec): 49.92 - samples/sec: 1435.80 - lr: 0.000006 - momentum: 0.000000
2023-10-15 18:22:35,983 epoch 9 - iter 1563/5212 - loss 0.01537751 - time (sec): 74.83 - samples/sec: 1446.59 - lr: 0.000006 - momentum: 0.000000
2023-10-15 18:23:01,098 epoch 9 - iter 2084/5212 - loss 0.01556559 - time (sec): 99.94 - samples/sec: 1448.93 - lr: 0.000005 - momentum: 0.000000
2023-10-15 18:23:26,191 epoch 9 - iter 2605/5212 - loss 0.01597811 - time (sec): 125.04 - samples/sec: 1441.42 - lr: 0.000005 - momentum: 0.000000
2023-10-15 18:23:52,046 epoch 9 - iter 3126/5212 - loss 0.01561272 - time (sec): 150.89 - samples/sec: 1451.99 - lr: 0.000005 - momentum: 0.000000
2023-10-15 18:24:16,900 epoch 9 - iter 3647/5212 - loss 0.01574407 - time (sec): 175.74 - samples/sec: 1453.65 - lr: 0.000004 - momentum: 0.000000
2023-10-15 18:24:42,461 epoch 9 - iter 4168/5212 - loss 0.01612832 - time (sec): 201.31 - samples/sec: 1448.57 - lr: 0.000004 - momentum: 0.000000
2023-10-15 18:25:08,164 epoch 9 - iter 4689/5212 - loss 0.01585164 - time (sec): 227.01 - samples/sec: 1452.94 - lr: 0.000004 - momentum: 0.000000
2023-10-15 18:25:33,956 epoch 9 - iter 5210/5212 - loss 0.01523723 - time (sec): 252.80 - samples/sec: 1452.97 - lr: 0.000003 - momentum: 0.000000
2023-10-15 18:25:34,044 ----------------------------------------------------------------------------------------------------
2023-10-15 18:25:34,044 EPOCH 9 done: loss 0.0152 - lr: 0.000003
2023-10-15 18:25:43,065 DEV : loss 0.5062488317489624 - f1-score (micro avg)  0.3655
2023-10-15 18:25:43,093 ----------------------------------------------------------------------------------------------------
2023-10-15 18:26:08,163 epoch 10 - iter 521/5212 - loss 0.00931166 - time (sec): 25.07 - samples/sec: 1432.29 - lr: 0.000003 - momentum: 0.000000
2023-10-15 18:26:33,323 epoch 10 - iter 1042/5212 - loss 0.01081890 - time (sec): 50.23 - samples/sec: 1458.64 - lr: 0.000003 - momentum: 0.000000
2023-10-15 18:26:58,431 epoch 10 - iter 1563/5212 - loss 0.01002542 - time (sec): 75.34 - samples/sec: 1467.77 - lr: 0.000002 - momentum: 0.000000
2023-10-15 18:27:24,060 epoch 10 - iter 2084/5212 - loss 0.00973675 - time (sec): 100.97 - samples/sec: 1456.67 - lr: 0.000002 - momentum: 0.000000
2023-10-15 18:27:49,708 epoch 10 - iter 2605/5212 - loss 0.01027887 - time (sec): 126.61 - samples/sec: 1466.36 - lr: 0.000002 - momentum: 0.000000
2023-10-15 18:28:15,215 epoch 10 - iter 3126/5212 - loss 0.01040973 - time (sec): 152.12 - samples/sec: 1460.95 - lr: 0.000001 - momentum: 0.000000
2023-10-15 18:28:40,464 epoch 10 - iter 3647/5212 - loss 0.01009036 - time (sec): 177.37 - samples/sec: 1459.94 - lr: 0.000001 - momentum: 0.000000
2023-10-15 18:29:05,142 epoch 10 - iter 4168/5212 - loss 0.01032693 - time (sec): 202.05 - samples/sec: 1454.33 - lr: 0.000001 - momentum: 0.000000
2023-10-15 18:29:30,640 epoch 10 - iter 4689/5212 - loss 0.01050964 - time (sec): 227.55 - samples/sec: 1448.72 - lr: 0.000000 - momentum: 0.000000
2023-10-15 18:29:55,882 epoch 10 - iter 5210/5212 - loss 0.01022552 - time (sec): 252.79 - samples/sec: 1452.20 - lr: 0.000000 - momentum: 0.000000
2023-10-15 18:29:56,017 ----------------------------------------------------------------------------------------------------
2023-10-15 18:29:56,017 EPOCH 10 done: loss 0.0102 - lr: 0.000000
2023-10-15 18:30:05,188 DEV : loss 0.5002045631408691 - f1-score (micro avg)  0.374
2023-10-15 18:30:05,584 ----------------------------------------------------------------------------------------------------
2023-10-15 18:30:05,585 Loading model from best epoch ...
2023-10-15 18:30:07,219 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-15 18:30:22,520 
Results:
- F-score (micro) 0.4665
- F-score (macro) 0.3023
- Accuracy 0.3084

By class:
              precision    recall  f1-score   support

         LOC     0.5410    0.5865    0.5628      1214
         PER     0.4010    0.4035    0.4022       808
         ORG     0.3038    0.2040    0.2441       353
   HumanProd     0.0000    0.0000    0.0000        15

   micro avg     0.4686    0.4644    0.4665      2390
   macro avg     0.3115    0.2985    0.3023      2390
weighted avg     0.4553    0.4644    0.4579      2390

2023-10-15 18:30:22,520 ----------------------------------------------------------------------------------------------------