Model Summary

lora_r: 64
lora_alpha: 16
lora_dropout: 0.1

Hanscripter is an instruction-tuned language model focused on translation classical Chinese (i.e WenYanwen 文言文) to English. Our Github repo.

Version

Below are detailed descriptions of the various parameters and technologies used.

The model uses Bitsandbytes for state-of-the-art model quantization, enhancing computational efficiency:

use_4bit: True - Enables the use of 4-bit quantization.
bnb_4bit_compute_dtype: "float16" - The datatype used for computation in quantized state.
bnb_4bit_quant_type: "nf4" - Specifies the quantization type.
use_nested_quant: False - Nested quantization is not used.

Settings for training the model are as follows:

num_train_epochs: 10
fp16: False
bf16: True - Optimized for use with A100 GPUs, employing Brain Floating Point (bf16).
per_device_train_batch_size: 2
per_device_eval_batch_size: 2
gradient_accumulation_steps: 4
gradient_checkpointing: True
max_grad_norm: 0.3
learning_rate: 0.0002
weight_decay: 0.001
optim: "paged_adamw_32bit"
lr_scheduler_type: "cosine"
max_steps: -1
warmup_ratio: 0.03
group_by_length: True