File size: 24,009 Bytes
786a0c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
2023-10-13 12:55:14,652 ----------------------------------------------------------------------------------------------------
2023-10-13 12:55:14,653 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(32001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-11): 12 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=21, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-13 12:55:14,653 ----------------------------------------------------------------------------------------------------
2023-10-13 12:55:14,653 MultiCorpus: 3575 train + 1235 dev + 1266 test sentences
 - NER_HIPE_2022 Corpus: 3575 train + 1235 dev + 1266 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/de/with_doc_seperator
2023-10-13 12:55:14,653 ----------------------------------------------------------------------------------------------------
2023-10-13 12:55:14,653 Train:  3575 sentences
2023-10-13 12:55:14,653         (train_with_dev=False, train_with_test=False)
2023-10-13 12:55:14,653 ----------------------------------------------------------------------------------------------------
2023-10-13 12:55:14,653 Training Params:
2023-10-13 12:55:14,653  - learning_rate: "3e-05" 
2023-10-13 12:55:14,653  - mini_batch_size: "4"
2023-10-13 12:55:14,653  - max_epochs: "10"
2023-10-13 12:55:14,653  - shuffle: "True"
2023-10-13 12:55:14,653 ----------------------------------------------------------------------------------------------------
2023-10-13 12:55:14,653 Plugins:
2023-10-13 12:55:14,654  - LinearScheduler | warmup_fraction: '0.1'
2023-10-13 12:55:14,654 ----------------------------------------------------------------------------------------------------
2023-10-13 12:55:14,654 Final evaluation on model from best epoch (best-model.pt)
2023-10-13 12:55:14,654  - metric: "('micro avg', 'f1-score')"
2023-10-13 12:55:14,654 ----------------------------------------------------------------------------------------------------
2023-10-13 12:55:14,654 Computation:
2023-10-13 12:55:14,654  - compute on device: cuda:0
2023-10-13 12:55:14,654  - embedding storage: none
2023-10-13 12:55:14,654 ----------------------------------------------------------------------------------------------------
2023-10-13 12:55:14,654 Model training base path: "hmbench-hipe2020/de-dbmdz/bert-base-historic-multilingual-cased-bs4-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-3"
2023-10-13 12:55:14,654 ----------------------------------------------------------------------------------------------------
2023-10-13 12:55:14,654 ----------------------------------------------------------------------------------------------------
2023-10-13 12:55:18,848 epoch 1 - iter 89/894 - loss 2.69593370 - time (sec): 4.19 - samples/sec: 2000.73 - lr: 0.000003 - momentum: 0.000000
2023-10-13 12:55:23,146 epoch 1 - iter 178/894 - loss 1.71958865 - time (sec): 8.49 - samples/sec: 2028.76 - lr: 0.000006 - momentum: 0.000000
2023-10-13 12:55:27,200 epoch 1 - iter 267/894 - loss 1.32701386 - time (sec): 12.54 - samples/sec: 2010.55 - lr: 0.000009 - momentum: 0.000000
2023-10-13 12:55:31,451 epoch 1 - iter 356/894 - loss 1.05861260 - time (sec): 16.80 - samples/sec: 2078.11 - lr: 0.000012 - momentum: 0.000000
2023-10-13 12:55:35,742 epoch 1 - iter 445/894 - loss 0.90380884 - time (sec): 21.09 - samples/sec: 2070.33 - lr: 0.000015 - momentum: 0.000000
2023-10-13 12:55:39,944 epoch 1 - iter 534/894 - loss 0.79977481 - time (sec): 25.29 - samples/sec: 2079.85 - lr: 0.000018 - momentum: 0.000000
2023-10-13 12:55:44,011 epoch 1 - iter 623/894 - loss 0.73143982 - time (sec): 29.36 - samples/sec: 2066.38 - lr: 0.000021 - momentum: 0.000000
2023-10-13 12:55:48,084 epoch 1 - iter 712/894 - loss 0.68005300 - time (sec): 33.43 - samples/sec: 2056.86 - lr: 0.000024 - momentum: 0.000000
2023-10-13 12:55:52,305 epoch 1 - iter 801/894 - loss 0.63023691 - time (sec): 37.65 - samples/sec: 2070.97 - lr: 0.000027 - momentum: 0.000000
2023-10-13 12:55:56,579 epoch 1 - iter 890/894 - loss 0.59407354 - time (sec): 41.92 - samples/sec: 2057.15 - lr: 0.000030 - momentum: 0.000000
2023-10-13 12:55:56,770 ----------------------------------------------------------------------------------------------------
2023-10-13 12:55:56,770 EPOCH 1 done: loss 0.5935 - lr: 0.000030
2023-10-13 12:56:01,600 DEV : loss 0.19242174923419952 - f1-score (micro avg)  0.581
2023-10-13 12:56:01,629 saving best model
2023-10-13 12:56:01,951 ----------------------------------------------------------------------------------------------------
2023-10-13 12:56:05,928 epoch 2 - iter 89/894 - loss 0.20351765 - time (sec): 3.98 - samples/sec: 2070.12 - lr: 0.000030 - momentum: 0.000000
2023-10-13 12:56:09,978 epoch 2 - iter 178/894 - loss 0.19371181 - time (sec): 8.03 - samples/sec: 2100.43 - lr: 0.000029 - momentum: 0.000000
2023-10-13 12:56:14,065 epoch 2 - iter 267/894 - loss 0.17774358 - time (sec): 12.11 - samples/sec: 2072.98 - lr: 0.000029 - momentum: 0.000000
2023-10-13 12:56:18,280 epoch 2 - iter 356/894 - loss 0.16603566 - time (sec): 16.33 - samples/sec: 2063.76 - lr: 0.000029 - momentum: 0.000000
2023-10-13 12:56:22,329 epoch 2 - iter 445/894 - loss 0.17216562 - time (sec): 20.38 - samples/sec: 2057.61 - lr: 0.000028 - momentum: 0.000000
2023-10-13 12:56:26,707 epoch 2 - iter 534/894 - loss 0.16847025 - time (sec): 24.75 - samples/sec: 2027.62 - lr: 0.000028 - momentum: 0.000000
2023-10-13 12:56:31,007 epoch 2 - iter 623/894 - loss 0.16459087 - time (sec): 29.06 - samples/sec: 2036.86 - lr: 0.000028 - momentum: 0.000000
2023-10-13 12:56:35,095 epoch 2 - iter 712/894 - loss 0.16278123 - time (sec): 33.14 - samples/sec: 2049.87 - lr: 0.000027 - momentum: 0.000000
2023-10-13 12:56:39,067 epoch 2 - iter 801/894 - loss 0.16215338 - time (sec): 37.11 - samples/sec: 2088.95 - lr: 0.000027 - momentum: 0.000000
2023-10-13 12:56:43,379 epoch 2 - iter 890/894 - loss 0.16148233 - time (sec): 41.43 - samples/sec: 2081.25 - lr: 0.000027 - momentum: 0.000000
2023-10-13 12:56:43,566 ----------------------------------------------------------------------------------------------------
2023-10-13 12:56:43,566 EPOCH 2 done: loss 0.1618 - lr: 0.000027
2023-10-13 12:56:52,106 DEV : loss 0.14601190388202667 - f1-score (micro avg)  0.707
2023-10-13 12:56:52,135 saving best model
2023-10-13 12:56:52,581 ----------------------------------------------------------------------------------------------------
2023-10-13 12:56:56,686 epoch 3 - iter 89/894 - loss 0.09364191 - time (sec): 4.10 - samples/sec: 1998.31 - lr: 0.000026 - momentum: 0.000000
2023-10-13 12:57:00,659 epoch 3 - iter 178/894 - loss 0.08926448 - time (sec): 8.07 - samples/sec: 1991.65 - lr: 0.000026 - momentum: 0.000000
2023-10-13 12:57:04,820 epoch 3 - iter 267/894 - loss 0.08910757 - time (sec): 12.23 - samples/sec: 2046.42 - lr: 0.000026 - momentum: 0.000000
2023-10-13 12:57:08,871 epoch 3 - iter 356/894 - loss 0.09229143 - time (sec): 16.29 - samples/sec: 2054.75 - lr: 0.000025 - momentum: 0.000000
2023-10-13 12:57:13,084 epoch 3 - iter 445/894 - loss 0.09330711 - time (sec): 20.50 - samples/sec: 2026.30 - lr: 0.000025 - momentum: 0.000000
2023-10-13 12:57:17,109 epoch 3 - iter 534/894 - loss 0.08872648 - time (sec): 24.52 - samples/sec: 2052.27 - lr: 0.000025 - momentum: 0.000000
2023-10-13 12:57:21,286 epoch 3 - iter 623/894 - loss 0.08943539 - time (sec): 28.70 - samples/sec: 2046.90 - lr: 0.000024 - momentum: 0.000000
2023-10-13 12:57:25,411 epoch 3 - iter 712/894 - loss 0.09181165 - time (sec): 32.83 - samples/sec: 2051.35 - lr: 0.000024 - momentum: 0.000000
2023-10-13 12:57:29,642 epoch 3 - iter 801/894 - loss 0.09100852 - time (sec): 37.06 - samples/sec: 2060.24 - lr: 0.000024 - momentum: 0.000000
2023-10-13 12:57:34,154 epoch 3 - iter 890/894 - loss 0.09219008 - time (sec): 41.57 - samples/sec: 2072.83 - lr: 0.000023 - momentum: 0.000000
2023-10-13 12:57:34,340 ----------------------------------------------------------------------------------------------------
2023-10-13 12:57:34,340 EPOCH 3 done: loss 0.0919 - lr: 0.000023
2023-10-13 12:57:42,891 DEV : loss 0.169508159160614 - f1-score (micro avg)  0.7217
2023-10-13 12:57:42,921 saving best model
2023-10-13 12:57:43,338 ----------------------------------------------------------------------------------------------------
2023-10-13 12:57:47,481 epoch 4 - iter 89/894 - loss 0.07066538 - time (sec): 4.14 - samples/sec: 2006.71 - lr: 0.000023 - momentum: 0.000000
2023-10-13 12:57:51,485 epoch 4 - iter 178/894 - loss 0.06521536 - time (sec): 8.15 - samples/sec: 2053.11 - lr: 0.000023 - momentum: 0.000000
2023-10-13 12:57:55,505 epoch 4 - iter 267/894 - loss 0.06095332 - time (sec): 12.17 - samples/sec: 2061.80 - lr: 0.000022 - momentum: 0.000000
2023-10-13 12:57:59,865 epoch 4 - iter 356/894 - loss 0.05999315 - time (sec): 16.53 - samples/sec: 2133.54 - lr: 0.000022 - momentum: 0.000000
2023-10-13 12:58:03,915 epoch 4 - iter 445/894 - loss 0.05842531 - time (sec): 20.58 - samples/sec: 2114.71 - lr: 0.000022 - momentum: 0.000000
2023-10-13 12:58:08,237 epoch 4 - iter 534/894 - loss 0.05992046 - time (sec): 24.90 - samples/sec: 2095.16 - lr: 0.000021 - momentum: 0.000000
2023-10-13 12:58:12,418 epoch 4 - iter 623/894 - loss 0.05852169 - time (sec): 29.08 - samples/sec: 2093.88 - lr: 0.000021 - momentum: 0.000000
2023-10-13 12:58:16,496 epoch 4 - iter 712/894 - loss 0.05864176 - time (sec): 33.16 - samples/sec: 2086.38 - lr: 0.000021 - momentum: 0.000000
2023-10-13 12:58:20,933 epoch 4 - iter 801/894 - loss 0.05904040 - time (sec): 37.59 - samples/sec: 2083.56 - lr: 0.000020 - momentum: 0.000000
2023-10-13 12:58:24,881 epoch 4 - iter 890/894 - loss 0.05946598 - time (sec): 41.54 - samples/sec: 2076.25 - lr: 0.000020 - momentum: 0.000000
2023-10-13 12:58:25,052 ----------------------------------------------------------------------------------------------------
2023-10-13 12:58:25,052 EPOCH 4 done: loss 0.0594 - lr: 0.000020
2023-10-13 12:58:33,488 DEV : loss 0.20608599483966827 - f1-score (micro avg)  0.7631
2023-10-13 12:58:33,518 saving best model
2023-10-13 12:58:34,001 ----------------------------------------------------------------------------------------------------
2023-10-13 12:58:38,060 epoch 5 - iter 89/894 - loss 0.05023316 - time (sec): 4.06 - samples/sec: 2106.72 - lr: 0.000020 - momentum: 0.000000
2023-10-13 12:58:42,103 epoch 5 - iter 178/894 - loss 0.04072573 - time (sec): 8.10 - samples/sec: 2066.73 - lr: 0.000019 - momentum: 0.000000
2023-10-13 12:58:46,306 epoch 5 - iter 267/894 - loss 0.04046872 - time (sec): 12.30 - samples/sec: 2056.71 - lr: 0.000019 - momentum: 0.000000
2023-10-13 12:58:50,544 epoch 5 - iter 356/894 - loss 0.04285187 - time (sec): 16.54 - samples/sec: 2012.79 - lr: 0.000019 - momentum: 0.000000
2023-10-13 12:58:54,937 epoch 5 - iter 445/894 - loss 0.04118917 - time (sec): 20.93 - samples/sec: 2034.33 - lr: 0.000018 - momentum: 0.000000
2023-10-13 12:58:59,181 epoch 5 - iter 534/894 - loss 0.04312736 - time (sec): 25.18 - samples/sec: 2018.89 - lr: 0.000018 - momentum: 0.000000
2023-10-13 12:59:03,441 epoch 5 - iter 623/894 - loss 0.04081111 - time (sec): 29.44 - samples/sec: 1999.35 - lr: 0.000018 - momentum: 0.000000
2023-10-13 12:59:07,980 epoch 5 - iter 712/894 - loss 0.04391013 - time (sec): 33.98 - samples/sec: 2030.64 - lr: 0.000017 - momentum: 0.000000
2023-10-13 12:59:12,804 epoch 5 - iter 801/894 - loss 0.04351538 - time (sec): 38.80 - samples/sec: 2012.42 - lr: 0.000017 - momentum: 0.000000
2023-10-13 12:59:17,151 epoch 5 - iter 890/894 - loss 0.04355119 - time (sec): 43.15 - samples/sec: 1999.22 - lr: 0.000017 - momentum: 0.000000
2023-10-13 12:59:17,355 ----------------------------------------------------------------------------------------------------
2023-10-13 12:59:17,355 EPOCH 5 done: loss 0.0435 - lr: 0.000017
2023-10-13 12:59:26,229 DEV : loss 0.20556075870990753 - f1-score (micro avg)  0.7782
2023-10-13 12:59:26,260 saving best model
2023-10-13 12:59:26,730 ----------------------------------------------------------------------------------------------------
2023-10-13 12:59:31,174 epoch 6 - iter 89/894 - loss 0.02131180 - time (sec): 4.44 - samples/sec: 2185.91 - lr: 0.000016 - momentum: 0.000000
2023-10-13 12:59:35,181 epoch 6 - iter 178/894 - loss 0.01740524 - time (sec): 8.45 - samples/sec: 2134.49 - lr: 0.000016 - momentum: 0.000000
2023-10-13 12:59:39,711 epoch 6 - iter 267/894 - loss 0.01734140 - time (sec): 12.98 - samples/sec: 2072.83 - lr: 0.000016 - momentum: 0.000000
2023-10-13 12:59:44,017 epoch 6 - iter 356/894 - loss 0.02350570 - time (sec): 17.28 - samples/sec: 2074.33 - lr: 0.000015 - momentum: 0.000000
2023-10-13 12:59:48,252 epoch 6 - iter 445/894 - loss 0.02543387 - time (sec): 21.52 - samples/sec: 2080.84 - lr: 0.000015 - momentum: 0.000000
2023-10-13 12:59:52,367 epoch 6 - iter 534/894 - loss 0.02567710 - time (sec): 25.63 - samples/sec: 2060.05 - lr: 0.000015 - momentum: 0.000000
2023-10-13 12:59:56,761 epoch 6 - iter 623/894 - loss 0.02518657 - time (sec): 30.03 - samples/sec: 2023.69 - lr: 0.000014 - momentum: 0.000000
2023-10-13 13:00:00,899 epoch 6 - iter 712/894 - loss 0.02743067 - time (sec): 34.17 - samples/sec: 2015.73 - lr: 0.000014 - momentum: 0.000000
2023-10-13 13:00:05,151 epoch 6 - iter 801/894 - loss 0.02804290 - time (sec): 38.42 - samples/sec: 2022.12 - lr: 0.000014 - momentum: 0.000000
2023-10-13 13:00:09,498 epoch 6 - iter 890/894 - loss 0.02756516 - time (sec): 42.77 - samples/sec: 2017.73 - lr: 0.000013 - momentum: 0.000000
2023-10-13 13:00:09,683 ----------------------------------------------------------------------------------------------------
2023-10-13 13:00:09,684 EPOCH 6 done: loss 0.0277 - lr: 0.000013
2023-10-13 13:00:18,535 DEV : loss 0.20008665323257446 - f1-score (micro avg)  0.7608
2023-10-13 13:00:18,570 ----------------------------------------------------------------------------------------------------
2023-10-13 13:00:22,657 epoch 7 - iter 89/894 - loss 0.02473568 - time (sec): 4.09 - samples/sec: 2283.61 - lr: 0.000013 - momentum: 0.000000
2023-10-13 13:00:27,033 epoch 7 - iter 178/894 - loss 0.02214234 - time (sec): 8.46 - samples/sec: 2258.02 - lr: 0.000013 - momentum: 0.000000
2023-10-13 13:00:31,190 epoch 7 - iter 267/894 - loss 0.01933197 - time (sec): 12.62 - samples/sec: 2215.83 - lr: 0.000012 - momentum: 0.000000
2023-10-13 13:00:35,256 epoch 7 - iter 356/894 - loss 0.01840027 - time (sec): 16.68 - samples/sec: 2236.09 - lr: 0.000012 - momentum: 0.000000
2023-10-13 13:00:39,376 epoch 7 - iter 445/894 - loss 0.01736996 - time (sec): 20.81 - samples/sec: 2212.78 - lr: 0.000012 - momentum: 0.000000
2023-10-13 13:00:43,341 epoch 7 - iter 534/894 - loss 0.01710256 - time (sec): 24.77 - samples/sec: 2178.14 - lr: 0.000011 - momentum: 0.000000
2023-10-13 13:00:47,660 epoch 7 - iter 623/894 - loss 0.01847066 - time (sec): 29.09 - samples/sec: 2116.68 - lr: 0.000011 - momentum: 0.000000
2023-10-13 13:00:51,880 epoch 7 - iter 712/894 - loss 0.01768895 - time (sec): 33.31 - samples/sec: 2101.35 - lr: 0.000011 - momentum: 0.000000
2023-10-13 13:00:55,976 epoch 7 - iter 801/894 - loss 0.01830601 - time (sec): 37.40 - samples/sec: 2086.22 - lr: 0.000010 - momentum: 0.000000
2023-10-13 13:01:00,075 epoch 7 - iter 890/894 - loss 0.01838476 - time (sec): 41.50 - samples/sec: 2074.90 - lr: 0.000010 - momentum: 0.000000
2023-10-13 13:01:00,255 ----------------------------------------------------------------------------------------------------
2023-10-13 13:01:00,255 EPOCH 7 done: loss 0.0183 - lr: 0.000010
2023-10-13 13:01:09,025 DEV : loss 0.23889903724193573 - f1-score (micro avg)  0.778
2023-10-13 13:01:09,055 ----------------------------------------------------------------------------------------------------
2023-10-13 13:01:13,200 epoch 8 - iter 89/894 - loss 0.01264123 - time (sec): 4.14 - samples/sec: 2109.44 - lr: 0.000010 - momentum: 0.000000
2023-10-13 13:01:17,329 epoch 8 - iter 178/894 - loss 0.01441008 - time (sec): 8.27 - samples/sec: 2044.81 - lr: 0.000009 - momentum: 0.000000
2023-10-13 13:01:21,833 epoch 8 - iter 267/894 - loss 0.01359192 - time (sec): 12.78 - samples/sec: 2117.06 - lr: 0.000009 - momentum: 0.000000
2023-10-13 13:01:26,166 epoch 8 - iter 356/894 - loss 0.01220036 - time (sec): 17.11 - samples/sec: 2075.45 - lr: 0.000009 - momentum: 0.000000
2023-10-13 13:01:30,406 epoch 8 - iter 445/894 - loss 0.01217291 - time (sec): 21.35 - samples/sec: 2063.59 - lr: 0.000008 - momentum: 0.000000
2023-10-13 13:01:34,751 epoch 8 - iter 534/894 - loss 0.01395930 - time (sec): 25.69 - samples/sec: 2061.54 - lr: 0.000008 - momentum: 0.000000
2023-10-13 13:01:38,837 epoch 8 - iter 623/894 - loss 0.01363106 - time (sec): 29.78 - samples/sec: 2061.57 - lr: 0.000008 - momentum: 0.000000
2023-10-13 13:01:43,036 epoch 8 - iter 712/894 - loss 0.01331376 - time (sec): 33.98 - samples/sec: 2049.69 - lr: 0.000007 - momentum: 0.000000
2023-10-13 13:01:47,676 epoch 8 - iter 801/894 - loss 0.01313221 - time (sec): 38.62 - samples/sec: 2032.02 - lr: 0.000007 - momentum: 0.000000
2023-10-13 13:01:52,010 epoch 8 - iter 890/894 - loss 0.01300296 - time (sec): 42.95 - samples/sec: 2008.08 - lr: 0.000007 - momentum: 0.000000
2023-10-13 13:01:52,187 ----------------------------------------------------------------------------------------------------
2023-10-13 13:01:52,188 EPOCH 8 done: loss 0.0131 - lr: 0.000007
2023-10-13 13:02:01,000 DEV : loss 0.23165372014045715 - f1-score (micro avg)  0.7734
2023-10-13 13:02:01,032 ----------------------------------------------------------------------------------------------------
2023-10-13 13:02:05,174 epoch 9 - iter 89/894 - loss 0.00510865 - time (sec): 4.14 - samples/sec: 1989.00 - lr: 0.000006 - momentum: 0.000000
2023-10-13 13:02:09,509 epoch 9 - iter 178/894 - loss 0.00451444 - time (sec): 8.48 - samples/sec: 2090.14 - lr: 0.000006 - momentum: 0.000000
2023-10-13 13:02:13,747 epoch 9 - iter 267/894 - loss 0.00647433 - time (sec): 12.71 - samples/sec: 2067.70 - lr: 0.000006 - momentum: 0.000000
2023-10-13 13:02:18,039 epoch 9 - iter 356/894 - loss 0.00857438 - time (sec): 17.01 - samples/sec: 2043.58 - lr: 0.000005 - momentum: 0.000000
2023-10-13 13:02:22,150 epoch 9 - iter 445/894 - loss 0.01024290 - time (sec): 21.12 - samples/sec: 2049.28 - lr: 0.000005 - momentum: 0.000000
2023-10-13 13:02:26,250 epoch 9 - iter 534/894 - loss 0.00899714 - time (sec): 25.22 - samples/sec: 2043.77 - lr: 0.000005 - momentum: 0.000000
2023-10-13 13:02:30,329 epoch 9 - iter 623/894 - loss 0.00792510 - time (sec): 29.30 - samples/sec: 2035.48 - lr: 0.000004 - momentum: 0.000000
2023-10-13 13:02:34,640 epoch 9 - iter 712/894 - loss 0.00742665 - time (sec): 33.61 - samples/sec: 2032.31 - lr: 0.000004 - momentum: 0.000000
2023-10-13 13:02:38,893 epoch 9 - iter 801/894 - loss 0.00799065 - time (sec): 37.86 - samples/sec: 2058.77 - lr: 0.000004 - momentum: 0.000000
2023-10-13 13:02:42,941 epoch 9 - iter 890/894 - loss 0.00818853 - time (sec): 41.91 - samples/sec: 2056.11 - lr: 0.000003 - momentum: 0.000000
2023-10-13 13:02:43,123 ----------------------------------------------------------------------------------------------------
2023-10-13 13:02:43,123 EPOCH 9 done: loss 0.0084 - lr: 0.000003
2023-10-13 13:02:51,867 DEV : loss 0.2373068630695343 - f1-score (micro avg)  0.7803
2023-10-13 13:02:51,898 saving best model
2023-10-13 13:02:52,380 ----------------------------------------------------------------------------------------------------
2023-10-13 13:02:56,667 epoch 10 - iter 89/894 - loss 0.00299024 - time (sec): 4.28 - samples/sec: 2137.65 - lr: 0.000003 - momentum: 0.000000
2023-10-13 13:03:00,728 epoch 10 - iter 178/894 - loss 0.00773541 - time (sec): 8.34 - samples/sec: 2091.33 - lr: 0.000003 - momentum: 0.000000
2023-10-13 13:03:05,105 epoch 10 - iter 267/894 - loss 0.00678755 - time (sec): 12.72 - samples/sec: 2063.17 - lr: 0.000002 - momentum: 0.000000
2023-10-13 13:03:09,449 epoch 10 - iter 356/894 - loss 0.00753006 - time (sec): 17.06 - samples/sec: 2048.86 - lr: 0.000002 - momentum: 0.000000
2023-10-13 13:03:13,636 epoch 10 - iter 445/894 - loss 0.00720676 - time (sec): 21.25 - samples/sec: 2025.78 - lr: 0.000002 - momentum: 0.000000
2023-10-13 13:03:17,696 epoch 10 - iter 534/894 - loss 0.00687943 - time (sec): 25.31 - samples/sec: 2026.65 - lr: 0.000001 - momentum: 0.000000
2023-10-13 13:03:22,040 epoch 10 - iter 623/894 - loss 0.00586593 - time (sec): 29.65 - samples/sec: 2033.97 - lr: 0.000001 - momentum: 0.000000
2023-10-13 13:03:26,626 epoch 10 - iter 712/894 - loss 0.00544570 - time (sec): 34.24 - samples/sec: 2055.88 - lr: 0.000001 - momentum: 0.000000
2023-10-13 13:03:30,764 epoch 10 - iter 801/894 - loss 0.00536573 - time (sec): 38.38 - samples/sec: 2038.80 - lr: 0.000000 - momentum: 0.000000
2023-10-13 13:03:34,874 epoch 10 - iter 890/894 - loss 0.00540931 - time (sec): 42.49 - samples/sec: 2027.55 - lr: 0.000000 - momentum: 0.000000
2023-10-13 13:03:35,057 ----------------------------------------------------------------------------------------------------
2023-10-13 13:03:35,058 EPOCH 10 done: loss 0.0054 - lr: 0.000000
2023-10-13 13:03:43,593 DEV : loss 0.23833510279655457 - f1-score (micro avg)  0.7798
2023-10-13 13:03:43,975 ----------------------------------------------------------------------------------------------------
2023-10-13 13:03:43,976 Loading model from best epoch ...
2023-10-13 13:03:45,339 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-prod, B-prod, E-prod, I-prod, S-time, B-time, E-time, I-time
2023-10-13 13:03:49,655 
Results:
- F-score (micro) 0.7373
- F-score (macro) 0.662
- Accuracy 0.6036

By class:
              precision    recall  f1-score   support

         loc     0.8468    0.8255    0.8360       596
        pers     0.6525    0.7387    0.6930       333
         org     0.4965    0.5303    0.5128       132
        prod     0.6531    0.4848    0.5565        66
        time     0.6727    0.7551    0.7115        49

   micro avg     0.7290    0.7457    0.7373      1176
   macro avg     0.6643    0.6669    0.6620      1176
weighted avg     0.7343    0.7457    0.7384      1176

2023-10-13 13:03:49,655 ----------------------------------------------------------------------------------------------------