Wauplin HF staff commited on
Commit
72ac185
1 Parent(s): a14ed68

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -167,6 +167,7 @@ model-index:
167
  source:
168
  url: https://huggingface.co/spaces/lmsys/mt-bench
169
  ---
 
170
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
171
  should probably proofread and complete it, then remove this comment. -->
172
 
@@ -235,9 +236,12 @@ Here's how you can run the model using the `pipeline()` function from 🤗 Trans
235
  # Install transformers from source - only needed for versions <= v4.34
236
  # pip install git+https://github.com/huggingface/transformers.git
237
  # pip install accelerate
 
238
  import torch
239
  from transformers import pipeline
 
240
  pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto")
 
241
  # We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
242
  messages = [
243
  {
@@ -299,6 +303,8 @@ The following hyperparameters were used during training:
299
  ### Training results
300
 
301
  The table below shows the full set of DPO training metrics:
 
 
302
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
303
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
304
  | 0.6284 | 0.05 | 100 | 0.6098 | 0.0425 | -0.1872 | 0.7344 | 0.2297 | -258.8416 | -253.8099 | -2.7976 | -2.8234 |
@@ -360,6 +366,7 @@ The table below shows the full set of DPO training metrics:
360
  | 0.0094 | 2.94 | 5700 | 0.7527 | -4.5542 | -8.3509 | 0.7812 | 3.7967 | -340.4790 | -299.7773 | -2.3062 | -2.3510 |
361
  | 0.0054 | 2.99 | 5800 | 0.7520 | -4.5169 | -8.3079 | 0.7812 | 3.7911 | -340.0493 | -299.4038 | -2.3081 | -2.3530 |
362
 
 
363
  ### Framework versions
364
 
365
  - Transformers 4.35.0.dev0
@@ -370,6 +377,7 @@ The table below shows the full set of DPO training metrics:
370
  ## Citation
371
 
372
  If you find Zephyr-7B-β is useful in your work, please cite it with:
 
373
  ```
374
  @misc{tunstall2023zephyr,
375
  title={Zephyr: Direct Distillation of LM Alignment},
@@ -382,6 +390,7 @@ If you find Zephyr-7B-β is useful in your work, please cite it with:
382
  ```
383
  # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
384
  Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta)
 
385
  | Metric | Value |
386
  |-----------------------|---------------------------|
387
  | Avg. | 52.15 |
@@ -391,4 +400,4 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
391
  | TruthfulQA (0-shot) | 57.45 |
392
  | Winogrande (5-shot) | 77.74 |
393
  | GSM8K (5-shot) | 12.74 |
394
- | DROP (3-shot) | 9.66 |
 
167
  source:
168
  url: https://huggingface.co/spaces/lmsys/mt-bench
169
  ---
170
+
171
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
172
  should probably proofread and complete it, then remove this comment. -->
173
 
 
236
  # Install transformers from source - only needed for versions <= v4.34
237
  # pip install git+https://github.com/huggingface/transformers.git
238
  # pip install accelerate
239
+
240
  import torch
241
  from transformers import pipeline
242
+
243
  pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto")
244
+
245
  # We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
246
  messages = [
247
  {
 
303
  ### Training results
304
 
305
  The table below shows the full set of DPO training metrics:
306
+
307
+
308
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
309
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
310
  | 0.6284 | 0.05 | 100 | 0.6098 | 0.0425 | -0.1872 | 0.7344 | 0.2297 | -258.8416 | -253.8099 | -2.7976 | -2.8234 |
 
366
  | 0.0094 | 2.94 | 5700 | 0.7527 | -4.5542 | -8.3509 | 0.7812 | 3.7967 | -340.4790 | -299.7773 | -2.3062 | -2.3510 |
367
  | 0.0054 | 2.99 | 5800 | 0.7520 | -4.5169 | -8.3079 | 0.7812 | 3.7911 | -340.0493 | -299.4038 | -2.3081 | -2.3530 |
368
 
369
+
370
  ### Framework versions
371
 
372
  - Transformers 4.35.0.dev0
 
377
  ## Citation
378
 
379
  If you find Zephyr-7B-β is useful in your work, please cite it with:
380
+
381
  ```
382
  @misc{tunstall2023zephyr,
383
  title={Zephyr: Direct Distillation of LM Alignment},
 
390
  ```
391
  # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
392
  Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta)
393
+
394
  | Metric | Value |
395
  |-----------------------|---------------------------|
396
  | Avg. | 52.15 |
 
400
  | TruthfulQA (0-shot) | 57.45 |
401
  | Winogrande (5-shot) | 77.74 |
402
  | GSM8K (5-shot) | 12.74 |
403
+ | DROP (3-shot) | 9.66 |