stefan-it commited on
Commit
dae8a4e
1 Parent(s): 13645c6

Upload folder using huggingface_hub

Browse files
Files changed (5) hide show
  1. best-model.pt +3 -0
  2. dev.tsv +0 -0
  3. loss.tsv +11 -0
  4. test.tsv +0 -0
  5. training.log +243 -0
best-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2db67e9b0a4d5462817d2fafaf260f0136ac5e8af7e51966d7f2aa80fcd6eb78
3
+ size 443335879
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 13:20:41 0.0000 0.6389 0.1785 0.7203 0.6122 0.6619 0.5045
3
+ 2 13:21:19 0.0000 0.1514 0.1278 0.6438 0.7631 0.6984 0.5571
4
+ 3 13:21:56 0.0000 0.0808 0.1350 0.7082 0.7592 0.7328 0.5946
5
+ 4 13:22:35 0.0000 0.0529 0.1548 0.7687 0.7795 0.7741 0.6516
6
+ 5 13:23:13 0.0000 0.0353 0.1776 0.7543 0.7826 0.7682 0.6425
7
+ 6 13:23:50 0.0000 0.0205 0.2128 0.7598 0.7568 0.7583 0.6343
8
+ 7 13:24:27 0.0000 0.0163 0.2125 0.7545 0.7928 0.7732 0.6463
9
+ 8 13:25:05 0.0000 0.0096 0.2141 0.7734 0.7952 0.7841 0.6630
10
+ 9 13:25:43 0.0000 0.0066 0.2306 0.7703 0.7920 0.7810 0.6578
11
+ 10 13:26:21 0.0000 0.0030 0.2350 0.7713 0.7991 0.7849 0.6632
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,243 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-13 13:20:08,359 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-13 13:20:08,360 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): BertModel(
5
+ (embeddings): BertEmbeddings(
6
+ (word_embeddings): Embedding(32001, 768)
7
+ (position_embeddings): Embedding(512, 768)
8
+ (token_type_embeddings): Embedding(2, 768)
9
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): BertEncoder(
13
+ (layer): ModuleList(
14
+ (0-11): 12 x BertLayer(
15
+ (attention): BertAttention(
16
+ (self): BertSelfAttention(
17
+ (query): Linear(in_features=768, out_features=768, bias=True)
18
+ (key): Linear(in_features=768, out_features=768, bias=True)
19
+ (value): Linear(in_features=768, out_features=768, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): BertSelfOutput(
23
+ (dense): Linear(in_features=768, out_features=768, bias=True)
24
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): BertIntermediate(
29
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
30
+ (intermediate_act_fn): GELUActivation()
31
+ )
32
+ (output): BertOutput(
33
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
34
+ (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
35
+ (dropout): Dropout(p=0.1, inplace=False)
36
+ )
37
+ )
38
+ )
39
+ )
40
+ (pooler): BertPooler(
41
+ (dense): Linear(in_features=768, out_features=768, bias=True)
42
+ (activation): Tanh()
43
+ )
44
+ )
45
+ )
46
+ (locked_dropout): LockedDropout(p=0.5)
47
+ (linear): Linear(in_features=768, out_features=21, bias=True)
48
+ (loss_function): CrossEntropyLoss()
49
+ )"
50
+ 2023-10-13 13:20:08,360 ----------------------------------------------------------------------------------------------------
51
+ 2023-10-13 13:20:08,360 MultiCorpus: 3575 train + 1235 dev + 1266 test sentences
52
+ - NER_HIPE_2022 Corpus: 3575 train + 1235 dev + 1266 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/de/with_doc_seperator
53
+ 2023-10-13 13:20:08,360 ----------------------------------------------------------------------------------------------------
54
+ 2023-10-13 13:20:08,360 Train: 3575 sentences
55
+ 2023-10-13 13:20:08,360 (train_with_dev=False, train_with_test=False)
56
+ 2023-10-13 13:20:08,360 ----------------------------------------------------------------------------------------------------
57
+ 2023-10-13 13:20:08,360 Training Params:
58
+ 2023-10-13 13:20:08,360 - learning_rate: "5e-05"
59
+ 2023-10-13 13:20:08,360 - mini_batch_size: "8"
60
+ 2023-10-13 13:20:08,360 - max_epochs: "10"
61
+ 2023-10-13 13:20:08,360 - shuffle: "True"
62
+ 2023-10-13 13:20:08,361 ----------------------------------------------------------------------------------------------------
63
+ 2023-10-13 13:20:08,361 Plugins:
64
+ 2023-10-13 13:20:08,361 - LinearScheduler | warmup_fraction: '0.1'
65
+ 2023-10-13 13:20:08,361 ----------------------------------------------------------------------------------------------------
66
+ 2023-10-13 13:20:08,361 Final evaluation on model from best epoch (best-model.pt)
67
+ 2023-10-13 13:20:08,361 - metric: "('micro avg', 'f1-score')"
68
+ 2023-10-13 13:20:08,361 ----------------------------------------------------------------------------------------------------
69
+ 2023-10-13 13:20:08,361 Computation:
70
+ 2023-10-13 13:20:08,361 - compute on device: cuda:0
71
+ 2023-10-13 13:20:08,361 - embedding storage: none
72
+ 2023-10-13 13:20:08,361 ----------------------------------------------------------------------------------------------------
73
+ 2023-10-13 13:20:08,361 Model training base path: "hmbench-hipe2020/de-dbmdz/bert-base-historic-multilingual-cased-bs8-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-4"
74
+ 2023-10-13 13:20:08,361 ----------------------------------------------------------------------------------------------------
75
+ 2023-10-13 13:20:08,361 ----------------------------------------------------------------------------------------------------
76
+ 2023-10-13 13:20:11,185 epoch 1 - iter 44/447 - loss 2.98851820 - time (sec): 2.82 - samples/sec: 3102.17 - lr: 0.000005 - momentum: 0.000000
77
+ 2023-10-13 13:20:14,004 epoch 1 - iter 88/447 - loss 1.95325123 - time (sec): 5.64 - samples/sec: 3148.92 - lr: 0.000010 - momentum: 0.000000
78
+ 2023-10-13 13:20:16,621 epoch 1 - iter 132/447 - loss 1.49794150 - time (sec): 8.26 - samples/sec: 3114.70 - lr: 0.000015 - momentum: 0.000000
79
+ 2023-10-13 13:20:19,630 epoch 1 - iter 176/447 - loss 1.20997650 - time (sec): 11.27 - samples/sec: 3062.86 - lr: 0.000020 - momentum: 0.000000
80
+ 2023-10-13 13:20:22,366 epoch 1 - iter 220/447 - loss 1.03351243 - time (sec): 14.00 - samples/sec: 3050.14 - lr: 0.000024 - momentum: 0.000000
81
+ 2023-10-13 13:20:25,109 epoch 1 - iter 264/447 - loss 0.91449018 - time (sec): 16.75 - samples/sec: 3041.16 - lr: 0.000029 - momentum: 0.000000
82
+ 2023-10-13 13:20:27,854 epoch 1 - iter 308/447 - loss 0.82776650 - time (sec): 19.49 - samples/sec: 3038.53 - lr: 0.000034 - momentum: 0.000000
83
+ 2023-10-13 13:20:30,594 epoch 1 - iter 352/447 - loss 0.75655567 - time (sec): 22.23 - samples/sec: 3042.01 - lr: 0.000039 - momentum: 0.000000
84
+ 2023-10-13 13:20:33,213 epoch 1 - iter 396/447 - loss 0.69589232 - time (sec): 24.85 - samples/sec: 3043.96 - lr: 0.000044 - momentum: 0.000000
85
+ 2023-10-13 13:20:36,387 epoch 1 - iter 440/447 - loss 0.64524193 - time (sec): 28.03 - samples/sec: 3041.44 - lr: 0.000049 - momentum: 0.000000
86
+ 2023-10-13 13:20:36,800 ----------------------------------------------------------------------------------------------------
87
+ 2023-10-13 13:20:36,800 EPOCH 1 done: loss 0.6389 - lr: 0.000049
88
+ 2023-10-13 13:20:41,834 DEV : loss 0.17851579189300537 - f1-score (micro avg) 0.6619
89
+ 2023-10-13 13:20:41,868 saving best model
90
+ 2023-10-13 13:20:42,189 ----------------------------------------------------------------------------------------------------
91
+ 2023-10-13 13:20:45,136 epoch 2 - iter 44/447 - loss 0.18786776 - time (sec): 2.95 - samples/sec: 3040.94 - lr: 0.000049 - momentum: 0.000000
92
+ 2023-10-13 13:20:48,309 epoch 2 - iter 88/447 - loss 0.18005101 - time (sec): 6.12 - samples/sec: 3024.51 - lr: 0.000049 - momentum: 0.000000
93
+ 2023-10-13 13:20:50,957 epoch 2 - iter 132/447 - loss 0.17172587 - time (sec): 8.77 - samples/sec: 2987.88 - lr: 0.000048 - momentum: 0.000000
94
+ 2023-10-13 13:20:53,671 epoch 2 - iter 176/447 - loss 0.17670397 - time (sec): 11.48 - samples/sec: 3002.83 - lr: 0.000048 - momentum: 0.000000
95
+ 2023-10-13 13:20:56,617 epoch 2 - iter 220/447 - loss 0.17152176 - time (sec): 14.43 - samples/sec: 2986.38 - lr: 0.000047 - momentum: 0.000000
96
+ 2023-10-13 13:20:59,327 epoch 2 - iter 264/447 - loss 0.16193941 - time (sec): 17.14 - samples/sec: 3019.15 - lr: 0.000047 - momentum: 0.000000
97
+ 2023-10-13 13:21:01,946 epoch 2 - iter 308/447 - loss 0.15929042 - time (sec): 19.75 - samples/sec: 3022.81 - lr: 0.000046 - momentum: 0.000000
98
+ 2023-10-13 13:21:04,541 epoch 2 - iter 352/447 - loss 0.15834174 - time (sec): 22.35 - samples/sec: 3031.62 - lr: 0.000046 - momentum: 0.000000
99
+ 2023-10-13 13:21:07,674 epoch 2 - iter 396/447 - loss 0.15381226 - time (sec): 25.48 - samples/sec: 3014.93 - lr: 0.000045 - momentum: 0.000000
100
+ 2023-10-13 13:21:10,380 epoch 2 - iter 440/447 - loss 0.15235393 - time (sec): 28.19 - samples/sec: 3024.09 - lr: 0.000045 - momentum: 0.000000
101
+ 2023-10-13 13:21:10,888 ----------------------------------------------------------------------------------------------------
102
+ 2023-10-13 13:21:10,889 EPOCH 2 done: loss 0.1514 - lr: 0.000045
103
+ 2023-10-13 13:21:19,373 DEV : loss 0.12778525054454803 - f1-score (micro avg) 0.6984
104
+ 2023-10-13 13:21:19,406 saving best model
105
+ 2023-10-13 13:21:19,820 ----------------------------------------------------------------------------------------------------
106
+ 2023-10-13 13:21:22,656 epoch 3 - iter 44/447 - loss 0.10065484 - time (sec): 2.83 - samples/sec: 3018.94 - lr: 0.000044 - momentum: 0.000000
107
+ 2023-10-13 13:21:25,786 epoch 3 - iter 88/447 - loss 0.08666742 - time (sec): 5.96 - samples/sec: 2992.75 - lr: 0.000043 - momentum: 0.000000
108
+ 2023-10-13 13:21:28,598 epoch 3 - iter 132/447 - loss 0.08572550 - time (sec): 8.78 - samples/sec: 2986.73 - lr: 0.000043 - momentum: 0.000000
109
+ 2023-10-13 13:21:31,298 epoch 3 - iter 176/447 - loss 0.08491542 - time (sec): 11.48 - samples/sec: 3003.02 - lr: 0.000042 - momentum: 0.000000
110
+ 2023-10-13 13:21:33,912 epoch 3 - iter 220/447 - loss 0.08321697 - time (sec): 14.09 - samples/sec: 2985.23 - lr: 0.000042 - momentum: 0.000000
111
+ 2023-10-13 13:21:36,687 epoch 3 - iter 264/447 - loss 0.08312643 - time (sec): 16.87 - samples/sec: 2997.64 - lr: 0.000041 - momentum: 0.000000
112
+ 2023-10-13 13:21:39,389 epoch 3 - iter 308/447 - loss 0.08303934 - time (sec): 19.57 - samples/sec: 2999.67 - lr: 0.000041 - momentum: 0.000000
113
+ 2023-10-13 13:21:42,237 epoch 3 - iter 352/447 - loss 0.07934551 - time (sec): 22.41 - samples/sec: 3012.75 - lr: 0.000040 - momentum: 0.000000
114
+ 2023-10-13 13:21:44,842 epoch 3 - iter 396/447 - loss 0.08105453 - time (sec): 25.02 - samples/sec: 3030.91 - lr: 0.000040 - momentum: 0.000000
115
+ 2023-10-13 13:21:47,987 epoch 3 - iter 440/447 - loss 0.08108549 - time (sec): 28.16 - samples/sec: 3028.88 - lr: 0.000039 - momentum: 0.000000
116
+ 2023-10-13 13:21:48,390 ----------------------------------------------------------------------------------------------------
117
+ 2023-10-13 13:21:48,390 EPOCH 3 done: loss 0.0808 - lr: 0.000039
118
+ 2023-10-13 13:21:56,816 DEV : loss 0.13502156734466553 - f1-score (micro avg) 0.7328
119
+ 2023-10-13 13:21:56,851 saving best model
120
+ 2023-10-13 13:21:57,269 ----------------------------------------------------------------------------------------------------
121
+ 2023-10-13 13:21:59,999 epoch 4 - iter 44/447 - loss 0.04201711 - time (sec): 2.72 - samples/sec: 3098.25 - lr: 0.000038 - momentum: 0.000000
122
+ 2023-10-13 13:22:02,677 epoch 4 - iter 88/447 - loss 0.05222326 - time (sec): 5.40 - samples/sec: 3037.66 - lr: 0.000038 - momentum: 0.000000
123
+ 2023-10-13 13:22:05,495 epoch 4 - iter 132/447 - loss 0.04940786 - time (sec): 8.22 - samples/sec: 3037.73 - lr: 0.000037 - momentum: 0.000000
124
+ 2023-10-13 13:22:08,212 epoch 4 - iter 176/447 - loss 0.04414691 - time (sec): 10.94 - samples/sec: 3046.88 - lr: 0.000037 - momentum: 0.000000
125
+ 2023-10-13 13:22:11,658 epoch 4 - iter 220/447 - loss 0.04915488 - time (sec): 14.38 - samples/sec: 3015.37 - lr: 0.000036 - momentum: 0.000000
126
+ 2023-10-13 13:22:14,498 epoch 4 - iter 264/447 - loss 0.05177181 - time (sec): 17.22 - samples/sec: 3015.62 - lr: 0.000036 - momentum: 0.000000
127
+ 2023-10-13 13:22:17,164 epoch 4 - iter 308/447 - loss 0.04997384 - time (sec): 19.89 - samples/sec: 3006.17 - lr: 0.000035 - momentum: 0.000000
128
+ 2023-10-13 13:22:19,895 epoch 4 - iter 352/447 - loss 0.05023042 - time (sec): 22.62 - samples/sec: 2998.23 - lr: 0.000035 - momentum: 0.000000
129
+ 2023-10-13 13:22:23,206 epoch 4 - iter 396/447 - loss 0.05174515 - time (sec): 25.93 - samples/sec: 2981.06 - lr: 0.000034 - momentum: 0.000000
130
+ 2023-10-13 13:22:25,978 epoch 4 - iter 440/447 - loss 0.05238596 - time (sec): 28.70 - samples/sec: 2969.31 - lr: 0.000033 - momentum: 0.000000
131
+ 2023-10-13 13:22:26,430 ----------------------------------------------------------------------------------------------------
132
+ 2023-10-13 13:22:26,431 EPOCH 4 done: loss 0.0529 - lr: 0.000033
133
+ 2023-10-13 13:22:35,016 DEV : loss 0.15478116273880005 - f1-score (micro avg) 0.7741
134
+ 2023-10-13 13:22:35,048 saving best model
135
+ 2023-10-13 13:22:35,500 ----------------------------------------------------------------------------------------------------
136
+ 2023-10-13 13:22:38,504 epoch 5 - iter 44/447 - loss 0.03979897 - time (sec): 3.00 - samples/sec: 2988.15 - lr: 0.000033 - momentum: 0.000000
137
+ 2023-10-13 13:22:41,411 epoch 5 - iter 88/447 - loss 0.03334224 - time (sec): 5.91 - samples/sec: 2922.69 - lr: 0.000032 - momentum: 0.000000
138
+ 2023-10-13 13:22:44,290 epoch 5 - iter 132/447 - loss 0.03321982 - time (sec): 8.79 - samples/sec: 2971.55 - lr: 0.000032 - momentum: 0.000000
139
+ 2023-10-13 13:22:47,083 epoch 5 - iter 176/447 - loss 0.03182161 - time (sec): 11.58 - samples/sec: 2994.52 - lr: 0.000031 - momentum: 0.000000
140
+ 2023-10-13 13:22:49,725 epoch 5 - iter 220/447 - loss 0.03370102 - time (sec): 14.22 - samples/sec: 2998.29 - lr: 0.000031 - momentum: 0.000000
141
+ 2023-10-13 13:22:52,582 epoch 5 - iter 264/447 - loss 0.03501394 - time (sec): 17.08 - samples/sec: 3001.26 - lr: 0.000030 - momentum: 0.000000
142
+ 2023-10-13 13:22:55,826 epoch 5 - iter 308/447 - loss 0.03844377 - time (sec): 20.32 - samples/sec: 2980.60 - lr: 0.000030 - momentum: 0.000000
143
+ 2023-10-13 13:22:58,482 epoch 5 - iter 352/447 - loss 0.03808194 - time (sec): 22.98 - samples/sec: 2984.83 - lr: 0.000029 - momentum: 0.000000
144
+ 2023-10-13 13:23:01,347 epoch 5 - iter 396/447 - loss 0.03629037 - time (sec): 25.85 - samples/sec: 2970.33 - lr: 0.000028 - momentum: 0.000000
145
+ 2023-10-13 13:23:04,266 epoch 5 - iter 440/447 - loss 0.03561767 - time (sec): 28.76 - samples/sec: 2968.14 - lr: 0.000028 - momentum: 0.000000
146
+ 2023-10-13 13:23:04,682 ----------------------------------------------------------------------------------------------------
147
+ 2023-10-13 13:23:04,682 EPOCH 5 done: loss 0.0353 - lr: 0.000028
148
+ 2023-10-13 13:23:13,184 DEV : loss 0.17760951817035675 - f1-score (micro avg) 0.7682
149
+ 2023-10-13 13:23:13,217 ----------------------------------------------------------------------------------------------------
150
+ 2023-10-13 13:23:16,042 epoch 6 - iter 44/447 - loss 0.01877968 - time (sec): 2.82 - samples/sec: 3048.07 - lr: 0.000027 - momentum: 0.000000
151
+ 2023-10-13 13:23:18,942 epoch 6 - iter 88/447 - loss 0.01932285 - time (sec): 5.72 - samples/sec: 3077.70 - lr: 0.000027 - momentum: 0.000000
152
+ 2023-10-13 13:23:21,625 epoch 6 - iter 132/447 - loss 0.02015927 - time (sec): 8.41 - samples/sec: 3088.96 - lr: 0.000026 - momentum: 0.000000
153
+ 2023-10-13 13:23:24,907 epoch 6 - iter 176/447 - loss 0.01821601 - time (sec): 11.69 - samples/sec: 3065.39 - lr: 0.000026 - momentum: 0.000000
154
+ 2023-10-13 13:23:27,771 epoch 6 - iter 220/447 - loss 0.01881791 - time (sec): 14.55 - samples/sec: 2979.48 - lr: 0.000025 - momentum: 0.000000
155
+ 2023-10-13 13:23:30,488 epoch 6 - iter 264/447 - loss 0.01860310 - time (sec): 17.27 - samples/sec: 2980.78 - lr: 0.000025 - momentum: 0.000000
156
+ 2023-10-13 13:23:33,366 epoch 6 - iter 308/447 - loss 0.01815398 - time (sec): 20.15 - samples/sec: 2976.57 - lr: 0.000024 - momentum: 0.000000
157
+ 2023-10-13 13:23:36,065 epoch 6 - iter 352/447 - loss 0.01915966 - time (sec): 22.85 - samples/sec: 2977.83 - lr: 0.000023 - momentum: 0.000000
158
+ 2023-10-13 13:23:38,865 epoch 6 - iter 396/447 - loss 0.01991320 - time (sec): 25.65 - samples/sec: 2999.07 - lr: 0.000023 - momentum: 0.000000
159
+ 2023-10-13 13:23:41,624 epoch 6 - iter 440/447 - loss 0.02043759 - time (sec): 28.41 - samples/sec: 3003.11 - lr: 0.000022 - momentum: 0.000000
160
+ 2023-10-13 13:23:42,034 ----------------------------------------------------------------------------------------------------
161
+ 2023-10-13 13:23:42,034 EPOCH 6 done: loss 0.0205 - lr: 0.000022
162
+ 2023-10-13 13:23:50,494 DEV : loss 0.21282611787319183 - f1-score (micro avg) 0.7583
163
+ 2023-10-13 13:23:50,525 ----------------------------------------------------------------------------------------------------
164
+ 2023-10-13 13:23:53,838 epoch 7 - iter 44/447 - loss 0.02198471 - time (sec): 3.31 - samples/sec: 2998.52 - lr: 0.000022 - momentum: 0.000000
165
+ 2023-10-13 13:23:56,555 epoch 7 - iter 88/447 - loss 0.01806647 - time (sec): 6.03 - samples/sec: 2959.07 - lr: 0.000021 - momentum: 0.000000
166
+ 2023-10-13 13:23:59,442 epoch 7 - iter 132/447 - loss 0.01433410 - time (sec): 8.92 - samples/sec: 2982.30 - lr: 0.000021 - momentum: 0.000000
167
+ 2023-10-13 13:24:02,316 epoch 7 - iter 176/447 - loss 0.01416791 - time (sec): 11.79 - samples/sec: 3005.39 - lr: 0.000020 - momentum: 0.000000
168
+ 2023-10-13 13:24:05,100 epoch 7 - iter 220/447 - loss 0.01395297 - time (sec): 14.57 - samples/sec: 3015.13 - lr: 0.000020 - momentum: 0.000000
169
+ 2023-10-13 13:24:07,721 epoch 7 - iter 264/447 - loss 0.01425224 - time (sec): 17.19 - samples/sec: 3003.40 - lr: 0.000019 - momentum: 0.000000
170
+ 2023-10-13 13:24:10,471 epoch 7 - iter 308/447 - loss 0.01599349 - time (sec): 19.95 - samples/sec: 3015.57 - lr: 0.000018 - momentum: 0.000000
171
+ 2023-10-13 13:24:13,263 epoch 7 - iter 352/447 - loss 0.01639863 - time (sec): 22.74 - samples/sec: 3007.11 - lr: 0.000018 - momentum: 0.000000
172
+ 2023-10-13 13:24:15,883 epoch 7 - iter 396/447 - loss 0.01654187 - time (sec): 25.36 - samples/sec: 3012.06 - lr: 0.000017 - momentum: 0.000000
173
+ 2023-10-13 13:24:18,717 epoch 7 - iter 440/447 - loss 0.01632206 - time (sec): 28.19 - samples/sec: 3031.51 - lr: 0.000017 - momentum: 0.000000
174
+ 2023-10-13 13:24:19,099 ----------------------------------------------------------------------------------------------------
175
+ 2023-10-13 13:24:19,099 EPOCH 7 done: loss 0.0163 - lr: 0.000017
176
+ 2023-10-13 13:24:27,956 DEV : loss 0.21246877312660217 - f1-score (micro avg) 0.7732
177
+ 2023-10-13 13:24:27,990 ----------------------------------------------------------------------------------------------------
178
+ 2023-10-13 13:24:30,849 epoch 8 - iter 44/447 - loss 0.01236522 - time (sec): 2.86 - samples/sec: 3006.90 - lr: 0.000016 - momentum: 0.000000
179
+ 2023-10-13 13:24:33,776 epoch 8 - iter 88/447 - loss 0.01153258 - time (sec): 5.78 - samples/sec: 2962.65 - lr: 0.000016 - momentum: 0.000000
180
+ 2023-10-13 13:24:36,431 epoch 8 - iter 132/447 - loss 0.01262017 - time (sec): 8.44 - samples/sec: 3003.64 - lr: 0.000015 - momentum: 0.000000
181
+ 2023-10-13 13:24:39,104 epoch 8 - iter 176/447 - loss 0.01112667 - time (sec): 11.11 - samples/sec: 3015.10 - lr: 0.000015 - momentum: 0.000000
182
+ 2023-10-13 13:24:41,911 epoch 8 - iter 220/447 - loss 0.00946565 - time (sec): 13.92 - samples/sec: 3002.28 - lr: 0.000014 - momentum: 0.000000
183
+ 2023-10-13 13:24:44,625 epoch 8 - iter 264/447 - loss 0.00948303 - time (sec): 16.63 - samples/sec: 3018.28 - lr: 0.000013 - momentum: 0.000000
184
+ 2023-10-13 13:24:47,483 epoch 8 - iter 308/447 - loss 0.00851173 - time (sec): 19.49 - samples/sec: 2998.77 - lr: 0.000013 - momentum: 0.000000
185
+ 2023-10-13 13:24:50,628 epoch 8 - iter 352/447 - loss 0.00968723 - time (sec): 22.64 - samples/sec: 2987.84 - lr: 0.000012 - momentum: 0.000000
186
+ 2023-10-13 13:24:53,754 epoch 8 - iter 396/447 - loss 0.00996054 - time (sec): 25.76 - samples/sec: 2977.54 - lr: 0.000012 - momentum: 0.000000
187
+ 2023-10-13 13:24:56,483 epoch 8 - iter 440/447 - loss 0.00980498 - time (sec): 28.49 - samples/sec: 2985.89 - lr: 0.000011 - momentum: 0.000000
188
+ 2023-10-13 13:24:56,967 ----------------------------------------------------------------------------------------------------
189
+ 2023-10-13 13:24:56,967 EPOCH 8 done: loss 0.0096 - lr: 0.000011
190
+ 2023-10-13 13:25:05,316 DEV : loss 0.21407929062843323 - f1-score (micro avg) 0.7841
191
+ 2023-10-13 13:25:05,349 saving best model
192
+ 2023-10-13 13:25:06,114 ----------------------------------------------------------------------------------------------------
193
+ 2023-10-13 13:25:08,889 epoch 9 - iter 44/447 - loss 0.00542228 - time (sec): 2.77 - samples/sec: 2990.28 - lr: 0.000011 - momentum: 0.000000
194
+ 2023-10-13 13:25:11,925 epoch 9 - iter 88/447 - loss 0.00463397 - time (sec): 5.81 - samples/sec: 2922.50 - lr: 0.000010 - momentum: 0.000000
195
+ 2023-10-13 13:25:14,532 epoch 9 - iter 132/447 - loss 0.00602455 - time (sec): 8.42 - samples/sec: 2961.12 - lr: 0.000010 - momentum: 0.000000
196
+ 2023-10-13 13:25:17,407 epoch 9 - iter 176/447 - loss 0.00746601 - time (sec): 11.29 - samples/sec: 2945.80 - lr: 0.000009 - momentum: 0.000000
197
+ 2023-10-13 13:25:20,398 epoch 9 - iter 220/447 - loss 0.00670432 - time (sec): 14.28 - samples/sec: 2945.21 - lr: 0.000008 - momentum: 0.000000
198
+ 2023-10-13 13:25:23,382 epoch 9 - iter 264/447 - loss 0.00608308 - time (sec): 17.27 - samples/sec: 2921.85 - lr: 0.000008 - momentum: 0.000000
199
+ 2023-10-13 13:25:26,159 epoch 9 - iter 308/447 - loss 0.00653327 - time (sec): 20.04 - samples/sec: 2942.50 - lr: 0.000007 - momentum: 0.000000
200
+ 2023-10-13 13:25:29,535 epoch 9 - iter 352/447 - loss 0.00606227 - time (sec): 23.42 - samples/sec: 2950.37 - lr: 0.000007 - momentum: 0.000000
201
+ 2023-10-13 13:25:32,391 epoch 9 - iter 396/447 - loss 0.00639259 - time (sec): 26.28 - samples/sec: 2943.53 - lr: 0.000006 - momentum: 0.000000
202
+ 2023-10-13 13:25:35,184 epoch 9 - iter 440/447 - loss 0.00614557 - time (sec): 29.07 - samples/sec: 2940.07 - lr: 0.000006 - momentum: 0.000000
203
+ 2023-10-13 13:25:35,554 ----------------------------------------------------------------------------------------------------
204
+ 2023-10-13 13:25:35,555 EPOCH 9 done: loss 0.0066 - lr: 0.000006
205
+ 2023-10-13 13:25:43,820 DEV : loss 0.23063816130161285 - f1-score (micro avg) 0.781
206
+ 2023-10-13 13:25:43,854 ----------------------------------------------------------------------------------------------------
207
+ 2023-10-13 13:25:47,476 epoch 10 - iter 44/447 - loss 0.00098734 - time (sec): 3.62 - samples/sec: 2730.27 - lr: 0.000005 - momentum: 0.000000
208
+ 2023-10-13 13:25:50,549 epoch 10 - iter 88/447 - loss 0.00181722 - time (sec): 6.69 - samples/sec: 2770.81 - lr: 0.000005 - momentum: 0.000000
209
+ 2023-10-13 13:25:53,340 epoch 10 - iter 132/447 - loss 0.00316493 - time (sec): 9.48 - samples/sec: 2826.86 - lr: 0.000004 - momentum: 0.000000
210
+ 2023-10-13 13:25:56,007 epoch 10 - iter 176/447 - loss 0.00376538 - time (sec): 12.15 - samples/sec: 2864.66 - lr: 0.000003 - momentum: 0.000000
211
+ 2023-10-13 13:25:58,830 epoch 10 - iter 220/447 - loss 0.00318066 - time (sec): 14.97 - samples/sec: 2891.81 - lr: 0.000003 - momentum: 0.000000
212
+ 2023-10-13 13:26:01,539 epoch 10 - iter 264/447 - loss 0.00282318 - time (sec): 17.68 - samples/sec: 2901.48 - lr: 0.000002 - momentum: 0.000000
213
+ 2023-10-13 13:26:04,410 epoch 10 - iter 308/447 - loss 0.00291001 - time (sec): 20.55 - samples/sec: 2894.73 - lr: 0.000002 - momentum: 0.000000
214
+ 2023-10-13 13:26:07,417 epoch 10 - iter 352/447 - loss 0.00276025 - time (sec): 23.56 - samples/sec: 2890.66 - lr: 0.000001 - momentum: 0.000000
215
+ 2023-10-13 13:26:10,104 epoch 10 - iter 396/447 - loss 0.00297373 - time (sec): 26.25 - samples/sec: 2912.16 - lr: 0.000001 - momentum: 0.000000
216
+ 2023-10-13 13:26:12,949 epoch 10 - iter 440/447 - loss 0.00306137 - time (sec): 29.09 - samples/sec: 2939.98 - lr: 0.000000 - momentum: 0.000000
217
+ 2023-10-13 13:26:13,356 ----------------------------------------------------------------------------------------------------
218
+ 2023-10-13 13:26:13,357 EPOCH 10 done: loss 0.0030 - lr: 0.000000
219
+ 2023-10-13 13:26:21,613 DEV : loss 0.2350376695394516 - f1-score (micro avg) 0.7849
220
+ 2023-10-13 13:26:21,645 saving best model
221
+ 2023-10-13 13:26:22,417 ----------------------------------------------------------------------------------------------------
222
+ 2023-10-13 13:26:22,419 Loading model from best epoch ...
223
+ 2023-10-13 13:26:23,884 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-prod, B-prod, E-prod, I-prod, S-time, B-time, E-time, I-time
224
+ 2023-10-13 13:26:28,948
225
+ Results:
226
+ - F-score (micro) 0.7551
227
+ - F-score (macro) 0.6861
228
+ - Accuracy 0.6253
229
+
230
+ By class:
231
+ precision recall f1-score support
232
+
233
+ loc 0.8361 0.8473 0.8417 596
234
+ pers 0.6658 0.7778 0.7175 333
235
+ org 0.5620 0.5152 0.5375 132
236
+ prod 0.6333 0.5758 0.6032 66
237
+ time 0.6909 0.7755 0.7308 49
238
+
239
+ micro avg 0.7388 0.7721 0.7551 1176
240
+ macro avg 0.6776 0.6983 0.6861 1176
241
+ weighted avg 0.7397 0.7721 0.7544 1176
242
+
243
+ 2023-10-13 13:26:28,948 ----------------------------------------------------------------------------------------------------