stefan-it commited on
Commit
371eec5
1 Parent(s): 439a319

Upload ./training.log with huggingface_hub

Browse files
Files changed (1) hide show
  1. training.log +512 -0
training.log ADDED
@@ -0,0 +1,512 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-23 22:30:17,229 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-23 22:30:17,230 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): BertModel(
5
+ (embeddings): BertEmbeddings(
6
+ (word_embeddings): Embedding(64001, 768)
7
+ (position_embeddings): Embedding(512, 768)
8
+ (token_type_embeddings): Embedding(2, 768)
9
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): BertEncoder(
13
+ (layer): ModuleList(
14
+ (0): BertLayer(
15
+ (attention): BertAttention(
16
+ (self): BertSelfAttention(
17
+ (query): Linear(in_features=768, out_features=768, bias=True)
18
+ (key): Linear(in_features=768, out_features=768, bias=True)
19
+ (value): Linear(in_features=768, out_features=768, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): BertSelfOutput(
23
+ (dense): Linear(in_features=768, out_features=768, bias=True)
24
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): BertIntermediate(
29
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
30
+ (intermediate_act_fn): GELUActivation()
31
+ )
32
+ (output): BertOutput(
33
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
34
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
35
+ (dropout): Dropout(p=0.1, inplace=False)
36
+ )
37
+ )
38
+ (1): BertLayer(
39
+ (attention): BertAttention(
40
+ (self): BertSelfAttention(
41
+ (query): Linear(in_features=768, out_features=768, bias=True)
42
+ (key): Linear(in_features=768, out_features=768, bias=True)
43
+ (value): Linear(in_features=768, out_features=768, bias=True)
44
+ (dropout): Dropout(p=0.1, inplace=False)
45
+ )
46
+ (output): BertSelfOutput(
47
+ (dense): Linear(in_features=768, out_features=768, bias=True)
48
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
49
+ (dropout): Dropout(p=0.1, inplace=False)
50
+ )
51
+ )
52
+ (intermediate): BertIntermediate(
53
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
54
+ (intermediate_act_fn): GELUActivation()
55
+ )
56
+ (output): BertOutput(
57
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
58
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
59
+ (dropout): Dropout(p=0.1, inplace=False)
60
+ )
61
+ )
62
+ (2): BertLayer(
63
+ (attention): BertAttention(
64
+ (self): BertSelfAttention(
65
+ (query): Linear(in_features=768, out_features=768, bias=True)
66
+ (key): Linear(in_features=768, out_features=768, bias=True)
67
+ (value): Linear(in_features=768, out_features=768, bias=True)
68
+ (dropout): Dropout(p=0.1, inplace=False)
69
+ )
70
+ (output): BertSelfOutput(
71
+ (dense): Linear(in_features=768, out_features=768, bias=True)
72
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
73
+ (dropout): Dropout(p=0.1, inplace=False)
74
+ )
75
+ )
76
+ (intermediate): BertIntermediate(
77
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
78
+ (intermediate_act_fn): GELUActivation()
79
+ )
80
+ (output): BertOutput(
81
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
82
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
83
+ (dropout): Dropout(p=0.1, inplace=False)
84
+ )
85
+ )
86
+ (3): BertLayer(
87
+ (attention): BertAttention(
88
+ (self): BertSelfAttention(
89
+ (query): Linear(in_features=768, out_features=768, bias=True)
90
+ (key): Linear(in_features=768, out_features=768, bias=True)
91
+ (value): Linear(in_features=768, out_features=768, bias=True)
92
+ (dropout): Dropout(p=0.1, inplace=False)
93
+ )
94
+ (output): BertSelfOutput(
95
+ (dense): Linear(in_features=768, out_features=768, bias=True)
96
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
97
+ (dropout): Dropout(p=0.1, inplace=False)
98
+ )
99
+ )
100
+ (intermediate): BertIntermediate(
101
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
102
+ (intermediate_act_fn): GELUActivation()
103
+ )
104
+ (output): BertOutput(
105
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
106
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
107
+ (dropout): Dropout(p=0.1, inplace=False)
108
+ )
109
+ )
110
+ (4): BertLayer(
111
+ (attention): BertAttention(
112
+ (self): BertSelfAttention(
113
+ (query): Linear(in_features=768, out_features=768, bias=True)
114
+ (key): Linear(in_features=768, out_features=768, bias=True)
115
+ (value): Linear(in_features=768, out_features=768, bias=True)
116
+ (dropout): Dropout(p=0.1, inplace=False)
117
+ )
118
+ (output): BertSelfOutput(
119
+ (dense): Linear(in_features=768, out_features=768, bias=True)
120
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
121
+ (dropout): Dropout(p=0.1, inplace=False)
122
+ )
123
+ )
124
+ (intermediate): BertIntermediate(
125
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
126
+ (intermediate_act_fn): GELUActivation()
127
+ )
128
+ (output): BertOutput(
129
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
130
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
131
+ (dropout): Dropout(p=0.1, inplace=False)
132
+ )
133
+ )
134
+ (5): BertLayer(
135
+ (attention): BertAttention(
136
+ (self): BertSelfAttention(
137
+ (query): Linear(in_features=768, out_features=768, bias=True)
138
+ (key): Linear(in_features=768, out_features=768, bias=True)
139
+ (value): Linear(in_features=768, out_features=768, bias=True)
140
+ (dropout): Dropout(p=0.1, inplace=False)
141
+ )
142
+ (output): BertSelfOutput(
143
+ (dense): Linear(in_features=768, out_features=768, bias=True)
144
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
145
+ (dropout): Dropout(p=0.1, inplace=False)
146
+ )
147
+ )
148
+ (intermediate): BertIntermediate(
149
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
150
+ (intermediate_act_fn): GELUActivation()
151
+ )
152
+ (output): BertOutput(
153
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
154
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
155
+ (dropout): Dropout(p=0.1, inplace=False)
156
+ )
157
+ )
158
+ (6): BertLayer(
159
+ (attention): BertAttention(
160
+ (self): BertSelfAttention(
161
+ (query): Linear(in_features=768, out_features=768, bias=True)
162
+ (key): Linear(in_features=768, out_features=768, bias=True)
163
+ (value): Linear(in_features=768, out_features=768, bias=True)
164
+ (dropout): Dropout(p=0.1, inplace=False)
165
+ )
166
+ (output): BertSelfOutput(
167
+ (dense): Linear(in_features=768, out_features=768, bias=True)
168
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
169
+ (dropout): Dropout(p=0.1, inplace=False)
170
+ )
171
+ )
172
+ (intermediate): BertIntermediate(
173
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
174
+ (intermediate_act_fn): GELUActivation()
175
+ )
176
+ (output): BertOutput(
177
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
178
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
179
+ (dropout): Dropout(p=0.1, inplace=False)
180
+ )
181
+ )
182
+ (7): BertLayer(
183
+ (attention): BertAttention(
184
+ (self): BertSelfAttention(
185
+ (query): Linear(in_features=768, out_features=768, bias=True)
186
+ (key): Linear(in_features=768, out_features=768, bias=True)
187
+ (value): Linear(in_features=768, out_features=768, bias=True)
188
+ (dropout): Dropout(p=0.1, inplace=False)
189
+ )
190
+ (output): BertSelfOutput(
191
+ (dense): Linear(in_features=768, out_features=768, bias=True)
192
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
193
+ (dropout): Dropout(p=0.1, inplace=False)
194
+ )
195
+ )
196
+ (intermediate): BertIntermediate(
197
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
198
+ (intermediate_act_fn): GELUActivation()
199
+ )
200
+ (output): BertOutput(
201
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
202
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
203
+ (dropout): Dropout(p=0.1, inplace=False)
204
+ )
205
+ )
206
+ (8): BertLayer(
207
+ (attention): BertAttention(
208
+ (self): BertSelfAttention(
209
+ (query): Linear(in_features=768, out_features=768, bias=True)
210
+ (key): Linear(in_features=768, out_features=768, bias=True)
211
+ (value): Linear(in_features=768, out_features=768, bias=True)
212
+ (dropout): Dropout(p=0.1, inplace=False)
213
+ )
214
+ (output): BertSelfOutput(
215
+ (dense): Linear(in_features=768, out_features=768, bias=True)
216
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
217
+ (dropout): Dropout(p=0.1, inplace=False)
218
+ )
219
+ )
220
+ (intermediate): BertIntermediate(
221
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
222
+ (intermediate_act_fn): GELUActivation()
223
+ )
224
+ (output): BertOutput(
225
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
226
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
227
+ (dropout): Dropout(p=0.1, inplace=False)
228
+ )
229
+ )
230
+ (9): BertLayer(
231
+ (attention): BertAttention(
232
+ (self): BertSelfAttention(
233
+ (query): Linear(in_features=768, out_features=768, bias=True)
234
+ (key): Linear(in_features=768, out_features=768, bias=True)
235
+ (value): Linear(in_features=768, out_features=768, bias=True)
236
+ (dropout): Dropout(p=0.1, inplace=False)
237
+ )
238
+ (output): BertSelfOutput(
239
+ (dense): Linear(in_features=768, out_features=768, bias=True)
240
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
241
+ (dropout): Dropout(p=0.1, inplace=False)
242
+ )
243
+ )
244
+ (intermediate): BertIntermediate(
245
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
246
+ (intermediate_act_fn): GELUActivation()
247
+ )
248
+ (output): BertOutput(
249
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
250
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
251
+ (dropout): Dropout(p=0.1, inplace=False)
252
+ )
253
+ )
254
+ (10): BertLayer(
255
+ (attention): BertAttention(
256
+ (self): BertSelfAttention(
257
+ (query): Linear(in_features=768, out_features=768, bias=True)
258
+ (key): Linear(in_features=768, out_features=768, bias=True)
259
+ (value): Linear(in_features=768, out_features=768, bias=True)
260
+ (dropout): Dropout(p=0.1, inplace=False)
261
+ )
262
+ (output): BertSelfOutput(
263
+ (dense): Linear(in_features=768, out_features=768, bias=True)
264
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
265
+ (dropout): Dropout(p=0.1, inplace=False)
266
+ )
267
+ )
268
+ (intermediate): BertIntermediate(
269
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
270
+ (intermediate_act_fn): GELUActivation()
271
+ )
272
+ (output): BertOutput(
273
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
274
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
275
+ (dropout): Dropout(p=0.1, inplace=False)
276
+ )
277
+ )
278
+ (11): BertLayer(
279
+ (attention): BertAttention(
280
+ (self): BertSelfAttention(
281
+ (query): Linear(in_features=768, out_features=768, bias=True)
282
+ (key): Linear(in_features=768, out_features=768, bias=True)
283
+ (value): Linear(in_features=768, out_features=768, bias=True)
284
+ (dropout): Dropout(p=0.1, inplace=False)
285
+ )
286
+ (output): BertSelfOutput(
287
+ (dense): Linear(in_features=768, out_features=768, bias=True)
288
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
289
+ (dropout): Dropout(p=0.1, inplace=False)
290
+ )
291
+ )
292
+ (intermediate): BertIntermediate(
293
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
294
+ (intermediate_act_fn): GELUActivation()
295
+ )
296
+ (output): BertOutput(
297
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
298
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
299
+ (dropout): Dropout(p=0.1, inplace=False)
300
+ )
301
+ )
302
+ )
303
+ )
304
+ (pooler): BertPooler(
305
+ (dense): Linear(in_features=768, out_features=768, bias=True)
306
+ (activation): Tanh()
307
+ )
308
+ )
309
+ )
310
+ (locked_dropout): LockedDropout(p=0.5)
311
+ (linear): Linear(in_features=768, out_features=21, bias=True)
312
+ (loss_function): CrossEntropyLoss()
313
+ )"
314
+ 2023-10-23 22:30:17,230 ----------------------------------------------------------------------------------------------------
315
+ 2023-10-23 22:30:17,230 MultiCorpus: 3575 train + 1235 dev + 1266 test sentences
316
+ - NER_HIPE_2022 Corpus: 3575 train + 1235 dev + 1266 test sentences - /home/ubuntu/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/de/with_doc_seperator
317
+ 2023-10-23 22:30:17,230 ----------------------------------------------------------------------------------------------------
318
+ 2023-10-23 22:30:17,230 Train: 3575 sentences
319
+ 2023-10-23 22:30:17,230 (train_with_dev=False, train_with_test=False)
320
+ 2023-10-23 22:30:17,230 ----------------------------------------------------------------------------------------------------
321
+ 2023-10-23 22:30:17,230 Training Params:
322
+ 2023-10-23 22:30:17,230 - learning_rate: "5e-05"
323
+ 2023-10-23 22:30:17,230 - mini_batch_size: "8"
324
+ 2023-10-23 22:30:17,230 - max_epochs: "10"
325
+ 2023-10-23 22:30:17,230 - shuffle: "True"
326
+ 2023-10-23 22:30:17,230 ----------------------------------------------------------------------------------------------------
327
+ 2023-10-23 22:30:17,230 Plugins:
328
+ 2023-10-23 22:30:17,230 - TensorboardLogger
329
+ 2023-10-23 22:30:17,230 - LinearScheduler | warmup_fraction: '0.1'
330
+ 2023-10-23 22:30:17,230 ----------------------------------------------------------------------------------------------------
331
+ 2023-10-23 22:30:17,230 Final evaluation on model from best epoch (best-model.pt)
332
+ 2023-10-23 22:30:17,230 - metric: "('micro avg', 'f1-score')"
333
+ 2023-10-23 22:30:17,230 ----------------------------------------------------------------------------------------------------
334
+ 2023-10-23 22:30:17,230 Computation:
335
+ 2023-10-23 22:30:17,230 - compute on device: cuda:0
336
+ 2023-10-23 22:30:17,230 - embedding storage: none
337
+ 2023-10-23 22:30:17,231 ----------------------------------------------------------------------------------------------------
338
+ 2023-10-23 22:30:17,231 Model training base path: "hmbench-hipe2020/de-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-4"
339
+ 2023-10-23 22:30:17,231 ----------------------------------------------------------------------------------------------------
340
+ 2023-10-23 22:30:17,231 ----------------------------------------------------------------------------------------------------
341
+ 2023-10-23 22:30:17,231 Logging anything other than scalars to TensorBoard is currently not supported.
342
+ 2023-10-23 22:30:21,267 epoch 1 - iter 44/447 - loss 2.61747162 - time (sec): 4.04 - samples/sec: 2035.61 - lr: 0.000005 - momentum: 0.000000
343
+ 2023-10-23 22:30:25,388 epoch 1 - iter 88/447 - loss 1.60840445 - time (sec): 8.16 - samples/sec: 2091.49 - lr: 0.000010 - momentum: 0.000000
344
+ 2023-10-23 22:30:29,288 epoch 1 - iter 132/447 - loss 1.22488104 - time (sec): 12.06 - samples/sec: 2090.31 - lr: 0.000015 - momentum: 0.000000
345
+ 2023-10-23 22:30:33,068 epoch 1 - iter 176/447 - loss 1.02876906 - time (sec): 15.84 - samples/sec: 2108.24 - lr: 0.000020 - momentum: 0.000000
346
+ 2023-10-23 22:30:36,953 epoch 1 - iter 220/447 - loss 0.87744594 - time (sec): 19.72 - samples/sec: 2129.38 - lr: 0.000024 - momentum: 0.000000
347
+ 2023-10-23 22:30:40,666 epoch 1 - iter 264/447 - loss 0.77450690 - time (sec): 23.43 - samples/sec: 2142.27 - lr: 0.000029 - momentum: 0.000000
348
+ 2023-10-23 22:30:44,600 epoch 1 - iter 308/447 - loss 0.69356769 - time (sec): 27.37 - samples/sec: 2149.59 - lr: 0.000034 - momentum: 0.000000
349
+ 2023-10-23 22:30:48,495 epoch 1 - iter 352/447 - loss 0.62969553 - time (sec): 31.26 - samples/sec: 2153.68 - lr: 0.000039 - momentum: 0.000000
350
+ 2023-10-23 22:30:52,331 epoch 1 - iter 396/447 - loss 0.58382135 - time (sec): 35.10 - samples/sec: 2154.72 - lr: 0.000044 - momentum: 0.000000
351
+ 2023-10-23 22:30:56,593 epoch 1 - iter 440/447 - loss 0.53971211 - time (sec): 39.36 - samples/sec: 2158.94 - lr: 0.000049 - momentum: 0.000000
352
+ 2023-10-23 22:30:57,344 ----------------------------------------------------------------------------------------------------
353
+ 2023-10-23 22:30:57,344 EPOCH 1 done: loss 0.5354 - lr: 0.000049
354
+ 2023-10-23 22:31:02,145 DEV : loss 0.14619310200214386 - f1-score (micro avg) 0.6262
355
+ 2023-10-23 22:31:02,165 saving best model
356
+ 2023-10-23 22:31:02,636 ----------------------------------------------------------------------------------------------------
357
+ 2023-10-23 22:31:06,348 epoch 2 - iter 44/447 - loss 0.13529306 - time (sec): 3.71 - samples/sec: 2152.18 - lr: 0.000049 - momentum: 0.000000
358
+ 2023-10-23 22:31:10,178 epoch 2 - iter 88/447 - loss 0.14040657 - time (sec): 7.54 - samples/sec: 2231.49 - lr: 0.000049 - momentum: 0.000000
359
+ 2023-10-23 22:31:14,578 epoch 2 - iter 132/447 - loss 0.14036495 - time (sec): 11.94 - samples/sec: 2176.35 - lr: 0.000048 - momentum: 0.000000
360
+ 2023-10-23 22:31:18,453 epoch 2 - iter 176/447 - loss 0.14258655 - time (sec): 15.82 - samples/sec: 2166.64 - lr: 0.000048 - momentum: 0.000000
361
+ 2023-10-23 22:31:22,524 epoch 2 - iter 220/447 - loss 0.14221669 - time (sec): 19.89 - samples/sec: 2156.71 - lr: 0.000047 - momentum: 0.000000
362
+ 2023-10-23 22:31:26,576 epoch 2 - iter 264/447 - loss 0.13370512 - time (sec): 23.94 - samples/sec: 2131.60 - lr: 0.000047 - momentum: 0.000000
363
+ 2023-10-23 22:31:30,355 epoch 2 - iter 308/447 - loss 0.13477548 - time (sec): 27.72 - samples/sec: 2134.27 - lr: 0.000046 - momentum: 0.000000
364
+ 2023-10-23 22:31:34,090 epoch 2 - iter 352/447 - loss 0.13109257 - time (sec): 31.45 - samples/sec: 2127.08 - lr: 0.000046 - momentum: 0.000000
365
+ 2023-10-23 22:31:38,417 epoch 2 - iter 396/447 - loss 0.13099059 - time (sec): 35.78 - samples/sec: 2127.77 - lr: 0.000045 - momentum: 0.000000
366
+ 2023-10-23 22:31:42,421 epoch 2 - iter 440/447 - loss 0.12705414 - time (sec): 39.78 - samples/sec: 2138.94 - lr: 0.000045 - momentum: 0.000000
367
+ 2023-10-23 22:31:43,024 ----------------------------------------------------------------------------------------------------
368
+ 2023-10-23 22:31:43,024 EPOCH 2 done: loss 0.1267 - lr: 0.000045
369
+ 2023-10-23 22:31:49,495 DEV : loss 0.13422338664531708 - f1-score (micro avg) 0.6981
370
+ 2023-10-23 22:31:49,516 saving best model
371
+ 2023-10-23 22:31:50,207 ----------------------------------------------------------------------------------------------------
372
+ 2023-10-23 22:31:54,081 epoch 3 - iter 44/447 - loss 0.06517716 - time (sec): 3.87 - samples/sec: 2019.03 - lr: 0.000044 - momentum: 0.000000
373
+ 2023-10-23 22:31:58,238 epoch 3 - iter 88/447 - loss 0.06909628 - time (sec): 8.03 - samples/sec: 2017.65 - lr: 0.000043 - momentum: 0.000000
374
+ 2023-10-23 22:32:01,998 epoch 3 - iter 132/447 - loss 0.06970190 - time (sec): 11.79 - samples/sec: 2077.66 - lr: 0.000043 - momentum: 0.000000
375
+ 2023-10-23 22:32:06,132 epoch 3 - iter 176/447 - loss 0.07724232 - time (sec): 15.92 - samples/sec: 2113.01 - lr: 0.000042 - momentum: 0.000000
376
+ 2023-10-23 22:32:09,840 epoch 3 - iter 220/447 - loss 0.07409603 - time (sec): 19.63 - samples/sec: 2092.92 - lr: 0.000042 - momentum: 0.000000
377
+ 2023-10-23 22:32:14,305 epoch 3 - iter 264/447 - loss 0.07522591 - time (sec): 24.10 - samples/sec: 2087.35 - lr: 0.000041 - momentum: 0.000000
378
+ 2023-10-23 22:32:18,612 epoch 3 - iter 308/447 - loss 0.07450279 - time (sec): 28.40 - samples/sec: 2095.97 - lr: 0.000041 - momentum: 0.000000
379
+ 2023-10-23 22:32:22,409 epoch 3 - iter 352/447 - loss 0.07276381 - time (sec): 32.20 - samples/sec: 2114.52 - lr: 0.000040 - momentum: 0.000000
380
+ 2023-10-23 22:32:26,239 epoch 3 - iter 396/447 - loss 0.07476661 - time (sec): 36.03 - samples/sec: 2124.32 - lr: 0.000040 - momentum: 0.000000
381
+ 2023-10-23 22:32:30,277 epoch 3 - iter 440/447 - loss 0.07597918 - time (sec): 40.07 - samples/sec: 2127.19 - lr: 0.000039 - momentum: 0.000000
382
+ 2023-10-23 22:32:30,858 ----------------------------------------------------------------------------------------------------
383
+ 2023-10-23 22:32:30,859 EPOCH 3 done: loss 0.0758 - lr: 0.000039
384
+ 2023-10-23 22:32:37,348 DEV : loss 0.13163481652736664 - f1-score (micro avg) 0.7203
385
+ 2023-10-23 22:32:37,368 saving best model
386
+ 2023-10-23 22:32:37,995 ----------------------------------------------------------------------------------------------------
387
+ 2023-10-23 22:32:41,892 epoch 4 - iter 44/447 - loss 0.04397609 - time (sec): 3.90 - samples/sec: 2153.22 - lr: 0.000038 - momentum: 0.000000
388
+ 2023-10-23 22:32:45,651 epoch 4 - iter 88/447 - loss 0.04709654 - time (sec): 7.66 - samples/sec: 2137.42 - lr: 0.000038 - momentum: 0.000000
389
+ 2023-10-23 22:32:49,533 epoch 4 - iter 132/447 - loss 0.04593196 - time (sec): 11.54 - samples/sec: 2166.47 - lr: 0.000037 - momentum: 0.000000
390
+ 2023-10-23 22:32:53,857 epoch 4 - iter 176/447 - loss 0.04818047 - time (sec): 15.86 - samples/sec: 2160.62 - lr: 0.000037 - momentum: 0.000000
391
+ 2023-10-23 22:32:58,108 epoch 4 - iter 220/447 - loss 0.04713671 - time (sec): 20.11 - samples/sec: 2134.79 - lr: 0.000036 - momentum: 0.000000
392
+ 2023-10-23 22:33:01,959 epoch 4 - iter 264/447 - loss 0.04951053 - time (sec): 23.96 - samples/sec: 2136.25 - lr: 0.000036 - momentum: 0.000000
393
+ 2023-10-23 22:33:05,688 epoch 4 - iter 308/447 - loss 0.04805972 - time (sec): 27.69 - samples/sec: 2144.03 - lr: 0.000035 - momentum: 0.000000
394
+ 2023-10-23 22:33:09,695 epoch 4 - iter 352/447 - loss 0.04776840 - time (sec): 31.70 - samples/sec: 2140.57 - lr: 0.000035 - momentum: 0.000000
395
+ 2023-10-23 22:33:13,577 epoch 4 - iter 396/447 - loss 0.04732331 - time (sec): 35.58 - samples/sec: 2137.75 - lr: 0.000034 - momentum: 0.000000
396
+ 2023-10-23 22:33:17,808 epoch 4 - iter 440/447 - loss 0.04667565 - time (sec): 39.81 - samples/sec: 2138.30 - lr: 0.000033 - momentum: 0.000000
397
+ 2023-10-23 22:33:18,478 ----------------------------------------------------------------------------------------------------
398
+ 2023-10-23 22:33:18,479 EPOCH 4 done: loss 0.0468 - lr: 0.000033
399
+ 2023-10-23 22:33:24,957 DEV : loss 0.15146000683307648 - f1-score (micro avg) 0.739
400
+ 2023-10-23 22:33:24,977 saving best model
401
+ 2023-10-23 22:33:25,573 ----------------------------------------------------------------------------------------------------
402
+ 2023-10-23 22:33:29,808 epoch 5 - iter 44/447 - loss 0.01668860 - time (sec): 4.23 - samples/sec: 2114.97 - lr: 0.000033 - momentum: 0.000000
403
+ 2023-10-23 22:33:33,878 epoch 5 - iter 88/447 - loss 0.02476922 - time (sec): 8.30 - samples/sec: 2093.66 - lr: 0.000032 - momentum: 0.000000
404
+ 2023-10-23 22:33:37,628 epoch 5 - iter 132/447 - loss 0.02649246 - time (sec): 12.05 - samples/sec: 2115.90 - lr: 0.000032 - momentum: 0.000000
405
+ 2023-10-23 22:33:41,750 epoch 5 - iter 176/447 - loss 0.03062251 - time (sec): 16.18 - samples/sec: 2133.69 - lr: 0.000031 - momentum: 0.000000
406
+ 2023-10-23 22:33:46,044 epoch 5 - iter 220/447 - loss 0.02841129 - time (sec): 20.47 - samples/sec: 2159.21 - lr: 0.000031 - momentum: 0.000000
407
+ 2023-10-23 22:33:49,757 epoch 5 - iter 264/447 - loss 0.02981830 - time (sec): 24.18 - samples/sec: 2148.21 - lr: 0.000030 - momentum: 0.000000
408
+ 2023-10-23 22:33:53,930 epoch 5 - iter 308/447 - loss 0.03059182 - time (sec): 28.36 - samples/sec: 2136.13 - lr: 0.000030 - momentum: 0.000000
409
+ 2023-10-23 22:33:57,668 epoch 5 - iter 352/447 - loss 0.03094377 - time (sec): 32.09 - samples/sec: 2139.50 - lr: 0.000029 - momentum: 0.000000
410
+ 2023-10-23 22:34:01,555 epoch 5 - iter 396/447 - loss 0.02981108 - time (sec): 35.98 - samples/sec: 2131.11 - lr: 0.000028 - momentum: 0.000000
411
+ 2023-10-23 22:34:05,425 epoch 5 - iter 440/447 - loss 0.02945931 - time (sec): 39.85 - samples/sec: 2135.15 - lr: 0.000028 - momentum: 0.000000
412
+ 2023-10-23 22:34:06,116 ----------------------------------------------------------------------------------------------------
413
+ 2023-10-23 22:34:06,117 EPOCH 5 done: loss 0.0293 - lr: 0.000028
414
+ 2023-10-23 22:34:12,590 DEV : loss 0.22155629098415375 - f1-score (micro avg) 0.7493
415
+ 2023-10-23 22:34:12,611 saving best model
416
+ 2023-10-23 22:34:13,209 ----------------------------------------------------------------------------------------------------
417
+ 2023-10-23 22:34:16,768 epoch 6 - iter 44/447 - loss 0.01815000 - time (sec): 3.56 - samples/sec: 2090.34 - lr: 0.000027 - momentum: 0.000000
418
+ 2023-10-23 22:34:20,690 epoch 6 - iter 88/447 - loss 0.01807258 - time (sec): 7.48 - samples/sec: 2124.94 - lr: 0.000027 - momentum: 0.000000
419
+ 2023-10-23 22:34:24,827 epoch 6 - iter 132/447 - loss 0.02179131 - time (sec): 11.62 - samples/sec: 2158.82 - lr: 0.000026 - momentum: 0.000000
420
+ 2023-10-23 22:34:28,893 epoch 6 - iter 176/447 - loss 0.02160888 - time (sec): 15.68 - samples/sec: 2146.17 - lr: 0.000026 - momentum: 0.000000
421
+ 2023-10-23 22:34:33,305 epoch 6 - iter 220/447 - loss 0.02172418 - time (sec): 20.09 - samples/sec: 2147.11 - lr: 0.000025 - momentum: 0.000000
422
+ 2023-10-23 22:34:37,007 epoch 6 - iter 264/447 - loss 0.02170470 - time (sec): 23.80 - samples/sec: 2151.46 - lr: 0.000025 - momentum: 0.000000
423
+ 2023-10-23 22:34:41,178 epoch 6 - iter 308/447 - loss 0.02234828 - time (sec): 27.97 - samples/sec: 2144.67 - lr: 0.000024 - momentum: 0.000000
424
+ 2023-10-23 22:34:45,340 epoch 6 - iter 352/447 - loss 0.02114853 - time (sec): 32.13 - samples/sec: 2131.89 - lr: 0.000023 - momentum: 0.000000
425
+ 2023-10-23 22:34:49,332 epoch 6 - iter 396/447 - loss 0.02105417 - time (sec): 36.12 - samples/sec: 2125.31 - lr: 0.000023 - momentum: 0.000000
426
+ 2023-10-23 22:34:53,202 epoch 6 - iter 440/447 - loss 0.01999840 - time (sec): 39.99 - samples/sec: 2126.53 - lr: 0.000022 - momentum: 0.000000
427
+ 2023-10-23 22:34:53,908 ----------------------------------------------------------------------------------------------------
428
+ 2023-10-23 22:34:53,908 EPOCH 6 done: loss 0.0199 - lr: 0.000022
429
+ 2023-10-23 22:35:00,402 DEV : loss 0.2293727993965149 - f1-score (micro avg) 0.7686
430
+ 2023-10-23 22:35:00,423 saving best model
431
+ 2023-10-23 22:35:01,014 ----------------------------------------------------------------------------------------------------
432
+ 2023-10-23 22:35:05,341 epoch 7 - iter 44/447 - loss 0.00980116 - time (sec): 4.33 - samples/sec: 2168.37 - lr: 0.000022 - momentum: 0.000000
433
+ 2023-10-23 22:35:09,245 epoch 7 - iter 88/447 - loss 0.01146217 - time (sec): 8.23 - samples/sec: 2133.32 - lr: 0.000021 - momentum: 0.000000
434
+ 2023-10-23 22:35:13,013 epoch 7 - iter 132/447 - loss 0.01089331 - time (sec): 12.00 - samples/sec: 2113.50 - lr: 0.000021 - momentum: 0.000000
435
+ 2023-10-23 22:35:16,734 epoch 7 - iter 176/447 - loss 0.01100412 - time (sec): 15.72 - samples/sec: 2103.62 - lr: 0.000020 - momentum: 0.000000
436
+ 2023-10-23 22:35:20,770 epoch 7 - iter 220/447 - loss 0.01260891 - time (sec): 19.76 - samples/sec: 2113.68 - lr: 0.000020 - momentum: 0.000000
437
+ 2023-10-23 22:35:24,654 epoch 7 - iter 264/447 - loss 0.01499648 - time (sec): 23.64 - samples/sec: 2103.53 - lr: 0.000019 - momentum: 0.000000
438
+ 2023-10-23 22:35:28,932 epoch 7 - iter 308/447 - loss 0.01520584 - time (sec): 27.92 - samples/sec: 2115.87 - lr: 0.000018 - momentum: 0.000000
439
+ 2023-10-23 22:35:33,256 epoch 7 - iter 352/447 - loss 0.01474127 - time (sec): 32.24 - samples/sec: 2134.52 - lr: 0.000018 - momentum: 0.000000
440
+ 2023-10-23 22:35:37,126 epoch 7 - iter 396/447 - loss 0.01433985 - time (sec): 36.11 - samples/sec: 2131.72 - lr: 0.000017 - momentum: 0.000000
441
+ 2023-10-23 22:35:40,902 epoch 7 - iter 440/447 - loss 0.01344578 - time (sec): 39.89 - samples/sec: 2131.24 - lr: 0.000017 - momentum: 0.000000
442
+ 2023-10-23 22:35:41,529 ----------------------------------------------------------------------------------------------------
443
+ 2023-10-23 22:35:41,529 EPOCH 7 done: loss 0.0133 - lr: 0.000017
444
+ 2023-10-23 22:35:48,021 DEV : loss 0.2617715001106262 - f1-score (micro avg) 0.7712
445
+ 2023-10-23 22:35:48,042 saving best model
446
+ 2023-10-23 22:35:48,634 ----------------------------------------------------------------------------------------------------
447
+ 2023-10-23 22:35:52,644 epoch 8 - iter 44/447 - loss 0.00739272 - time (sec): 4.01 - samples/sec: 2124.55 - lr: 0.000016 - momentum: 0.000000
448
+ 2023-10-23 22:35:56,893 epoch 8 - iter 88/447 - loss 0.00595935 - time (sec): 8.26 - samples/sec: 2069.00 - lr: 0.000016 - momentum: 0.000000
449
+ 2023-10-23 22:36:00,775 epoch 8 - iter 132/447 - loss 0.00894853 - time (sec): 12.14 - samples/sec: 2110.05 - lr: 0.000015 - momentum: 0.000000
450
+ 2023-10-23 22:36:05,398 epoch 8 - iter 176/447 - loss 0.00734796 - time (sec): 16.76 - samples/sec: 2104.25 - lr: 0.000015 - momentum: 0.000000
451
+ 2023-10-23 22:36:09,056 epoch 8 - iter 220/447 - loss 0.00672069 - time (sec): 20.42 - samples/sec: 2108.20 - lr: 0.000014 - momentum: 0.000000
452
+ 2023-10-23 22:36:12,942 epoch 8 - iter 264/447 - loss 0.00679148 - time (sec): 24.31 - samples/sec: 2124.47 - lr: 0.000013 - momentum: 0.000000
453
+ 2023-10-23 22:36:16,667 epoch 8 - iter 308/447 - loss 0.00768702 - time (sec): 28.03 - samples/sec: 2133.16 - lr: 0.000013 - momentum: 0.000000
454
+ 2023-10-23 22:36:20,306 epoch 8 - iter 352/447 - loss 0.00760444 - time (sec): 31.67 - samples/sec: 2125.31 - lr: 0.000012 - momentum: 0.000000
455
+ 2023-10-23 22:36:24,266 epoch 8 - iter 396/447 - loss 0.00734192 - time (sec): 35.63 - samples/sec: 2129.40 - lr: 0.000012 - momentum: 0.000000
456
+ 2023-10-23 22:36:28,646 epoch 8 - iter 440/447 - loss 0.00806862 - time (sec): 40.01 - samples/sec: 2128.80 - lr: 0.000011 - momentum: 0.000000
457
+ 2023-10-23 22:36:29,337 ----------------------------------------------------------------------------------------------------
458
+ 2023-10-23 22:36:29,337 EPOCH 8 done: loss 0.0080 - lr: 0.000011
459
+ 2023-10-23 22:36:35,571 DEV : loss 0.2736358642578125 - f1-score (micro avg) 0.7733
460
+ 2023-10-23 22:36:35,592 saving best model
461
+ 2023-10-23 22:36:36,185 ----------------------------------------------------------------------------------------------------
462
+ 2023-10-23 22:36:39,878 epoch 9 - iter 44/447 - loss 0.00454046 - time (sec): 3.69 - samples/sec: 2163.58 - lr: 0.000011 - momentum: 0.000000
463
+ 2023-10-23 22:36:44,012 epoch 9 - iter 88/447 - loss 0.00391630 - time (sec): 7.83 - samples/sec: 2060.42 - lr: 0.000010 - momentum: 0.000000
464
+ 2023-10-23 22:36:47,842 epoch 9 - iter 132/447 - loss 0.00352831 - time (sec): 11.66 - samples/sec: 2119.86 - lr: 0.000010 - momentum: 0.000000
465
+ 2023-10-23 22:36:51,495 epoch 9 - iter 176/447 - loss 0.00324014 - time (sec): 15.31 - samples/sec: 2143.80 - lr: 0.000009 - momentum: 0.000000
466
+ 2023-10-23 22:36:55,486 epoch 9 - iter 220/447 - loss 0.00376207 - time (sec): 19.30 - samples/sec: 2145.27 - lr: 0.000008 - momentum: 0.000000
467
+ 2023-10-23 22:36:59,856 epoch 9 - iter 264/447 - loss 0.00521423 - time (sec): 23.67 - samples/sec: 2153.42 - lr: 0.000008 - momentum: 0.000000
468
+ 2023-10-23 22:37:03,961 epoch 9 - iter 308/447 - loss 0.00537886 - time (sec): 27.77 - samples/sec: 2146.14 - lr: 0.000007 - momentum: 0.000000
469
+ 2023-10-23 22:37:08,356 epoch 9 - iter 352/447 - loss 0.00484718 - time (sec): 32.17 - samples/sec: 2142.65 - lr: 0.000007 - momentum: 0.000000
470
+ 2023-10-23 22:37:12,275 epoch 9 - iter 396/447 - loss 0.00487190 - time (sec): 36.09 - samples/sec: 2132.68 - lr: 0.000006 - momentum: 0.000000
471
+ 2023-10-23 22:37:16,222 epoch 9 - iter 440/447 - loss 0.00525891 - time (sec): 40.04 - samples/sec: 2131.23 - lr: 0.000006 - momentum: 0.000000
472
+ 2023-10-23 22:37:16,845 ----------------------------------------------------------------------------------------------------
473
+ 2023-10-23 22:37:16,846 EPOCH 9 done: loss 0.0052 - lr: 0.000006
474
+ 2023-10-23 22:37:23,060 DEV : loss 0.2881031036376953 - f1-score (micro avg) 0.7758
475
+ 2023-10-23 22:37:23,081 saving best model
476
+ 2023-10-23 22:37:23,683 ----------------------------------------------------------------------------------------------------
477
+ 2023-10-23 22:37:27,975 epoch 10 - iter 44/447 - loss 0.00315695 - time (sec): 4.29 - samples/sec: 2066.95 - lr: 0.000005 - momentum: 0.000000
478
+ 2023-10-23 22:37:32,248 epoch 10 - iter 88/447 - loss 0.00274169 - time (sec): 8.56 - samples/sec: 2001.47 - lr: 0.000005 - momentum: 0.000000
479
+ 2023-10-23 22:37:35,940 epoch 10 - iter 132/447 - loss 0.00183972 - time (sec): 12.26 - samples/sec: 2099.04 - lr: 0.000004 - momentum: 0.000000
480
+ 2023-10-23 22:37:40,088 epoch 10 - iter 176/447 - loss 0.00256280 - time (sec): 16.40 - samples/sec: 2122.84 - lr: 0.000003 - momentum: 0.000000
481
+ 2023-10-23 22:37:43,873 epoch 10 - iter 220/447 - loss 0.00354318 - time (sec): 20.19 - samples/sec: 2121.91 - lr: 0.000003 - momentum: 0.000000
482
+ 2023-10-23 22:37:47,546 epoch 10 - iter 264/447 - loss 0.00354710 - time (sec): 23.86 - samples/sec: 2127.10 - lr: 0.000002 - momentum: 0.000000
483
+ 2023-10-23 22:37:51,706 epoch 10 - iter 308/447 - loss 0.00327233 - time (sec): 28.02 - samples/sec: 2126.72 - lr: 0.000002 - momentum: 0.000000
484
+ 2023-10-23 22:37:55,430 epoch 10 - iter 352/447 - loss 0.00332028 - time (sec): 31.75 - samples/sec: 2116.46 - lr: 0.000001 - momentum: 0.000000
485
+ 2023-10-23 22:37:59,496 epoch 10 - iter 396/447 - loss 0.00374741 - time (sec): 35.81 - samples/sec: 2124.40 - lr: 0.000001 - momentum: 0.000000
486
+ 2023-10-23 22:38:03,509 epoch 10 - iter 440/447 - loss 0.00354295 - time (sec): 39.82 - samples/sec: 2115.19 - lr: 0.000000 - momentum: 0.000000
487
+ 2023-10-23 22:38:04,545 ----------------------------------------------------------------------------------------------------
488
+ 2023-10-23 22:38:04,546 EPOCH 10 done: loss 0.0035 - lr: 0.000000
489
+ 2023-10-23 22:38:10,759 DEV : loss 0.2901349365711212 - f1-score (micro avg) 0.7753
490
+ 2023-10-23 22:38:11,256 ----------------------------------------------------------------------------------------------------
491
+ 2023-10-23 22:38:11,257 Loading model from best epoch ...
492
+ 2023-10-23 22:38:13,012 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-prod, B-prod, E-prod, I-prod, S-time, B-time, E-time, I-time
493
+ 2023-10-23 22:38:17,859
494
+ Results:
495
+ - F-score (micro) 0.751
496
+ - F-score (macro) 0.6724
497
+ - Accuracy 0.6218
498
+
499
+ By class:
500
+ precision recall f1-score support
501
+
502
+ loc 0.8395 0.8423 0.8409 596
503
+ pers 0.6658 0.7778 0.7175 333
504
+ org 0.5588 0.4318 0.4872 132
505
+ prod 0.6531 0.4848 0.5565 66
506
+ time 0.7451 0.7755 0.7600 49
507
+
508
+ micro avg 0.7468 0.7551 0.7510 1176
509
+ macro avg 0.6925 0.6624 0.6724 1176
510
+ weighted avg 0.7444 0.7551 0.7469 1176
511
+
512
+ 2023-10-23 22:38:17,859 ----------------------------------------------------------------------------------------------------