Edit model card

Model Summary

Hanscripter is an instruction-tuned language model focused on translation classical Chinese (i.e WenYanwen 文言文) to English. Our Github repo.

  • Base Model: Meta-Llama-3-8B-Instruct
  • SFT Dataset: KaifengGGG/WenYanWen_English_Parallel
  • Fine-tune Method: QLoRA

Version

Usage

Fine-tuning Details

Below are detailed descriptions of the various parameters and technologies used.

LoRA Parameters

  • lora_r: 64
  • lora_alpha: 16
  • lora_dropout: 0.1

Quantization

The model uses Bitsandbytes for state-of-the-art model quantization, enhancing computational efficiency:

  • use_4bit: True - Enables the use of 4-bit quantization.
  • bnb_4bit_compute_dtype: "float16" - The datatype used for computation in quantized state.
  • bnb_4bit_quant_type: "nf4" - Specifies the quantization type.
  • use_nested_quant: False - Nested quantization is not used.

Training Arguments

Settings for training the model are as follows:

  • num_train_epochs: 10
  • fp16: False
  • bf16: True - Optimized for use with A100 GPUs, employing Brain Floating Point (bf16).
  • per_device_train_batch_size: 2
  • per_device_eval_batch_size: 2
  • gradient_accumulation_steps: 4
  • gradient_checkpointing: True
  • max_grad_norm: 0.3
  • learning_rate: 0.0002
  • weight_decay: 0.001
  • optim: "paged_adamw_32bit"
  • lr_scheduler_type: "cosine"
  • max_steps: -1
  • warmup_ratio: 0.03
  • group_by_length: True
Downloads last month
2
Safetensors
Model size
8.03B params
Tensor type
F32
·

Dataset used to train KaifengGGG/Llama3-8b-Hanscripter