metadata

datasets:
  - stingning/ultrachat
language:
  - zh
  - en
library_name: transformers
pipeline_tag: text-generation
tags:
  - MiniCPM
  - ModelBest
  - THUNLP
  - conversational
  - custom_code

MiniCPM-2B-128k

OpenBMB Technical Blog Series

MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. MiniCPM-2B-128k is a long context extension trial of MiniCPM-2B. To our best knowledge, MiniCPM-2B-128k is the first long context(>=128k) SLM smaller than 3B。 In comparison with the previous released MiniCPM-2B, the improvements include:

Supports 128k context, achieving the best score under 7B on the comprehensive long-text evaluation InfiniteBench, but performance drops within 4k context
To facilitate community developers, the model has updated the {} directive template to chatml format (user\n{}\nassistant\n) during alignment, which also aids users in deploying and using the vllm openai compatible server mode.
Due to the parallel mechanism requirement, removed tie_embedding and expanded the vocabulary to 127660.

For more details, please refer to the GitHub repo and Blog.

MiniCPM 是面壁与清华大学自然语言处理实验室共同开源的系列端侧语言大模型，主体语言模型 MiniCPM-2B 仅有 24亿（2.4B）的非词嵌入参数量。 MiniCPM-2B-128k 是一次基于 MiniCPM-2B 的长度扩展尝试，也是第一个 3B 以下的长文本模型。相对于之前发布的版本，改进如下：

支持 128k 上下文，在综合长文本评测 InfiniteBench 上取得 7B 以下最佳成绩，但在 4k 以内性能有下降
为方便社区开发者使用，该模型在对齐时将 <用户>{} 指令模板更新为了 chatml 格式（<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n），这也有助于用户使用 vllm openai compatible server 模式部署和使用。
由于并行机制需要,去除了 tie_embedding，并扩展词表到 127660。

更多细节请参考 GitHub repo 和 Blog

Evaluation Results 评测结果

Model	avg	avg w/o code&math	passkey	number_string	kv_retrieval	longbook_choice_eng	longbook_qa_chn	longbook_qa_eng	longbook_sum_eng	longdialogue_qa_eng	math_find	code_debug	code_run
LWM-Text-128k	24.45	33.62	100	97.8	0.6	28.82	15.93	14.31	9.99	1.5	3.43	20.05	1
Yarn-Mistral-7b-128k	19.84	27.36	92.71		0	27.95	15.49	9.55	9.06	7.5	17.14	0.76	1.25
Mistral-7B-Instruct-v0.2(ABF 1000w)	27.75	36.9	100	78.98	3.6	37.12	11.74	17.37	21.12	9.5	29.43	17.51	0
Yi-6B-200k	22.15	32.54	100	94.92	0	36.68	15.07	9.2	0.92	3.5	4.29	0.51	0.75
chatglm3-6b-128k	25.58	36.57	89.93	99.66	5.2	46.29	10.7	8.38	25.91	6.5	8	5.33	1
MiniCPM-2.4B-128k	27.32	37.68	98.31	99.83	9	29.69	23.06	16.33	15.73	9.5	4.29	22.08	0

Notice: We discovered that the quality of Huggingface generation is slightly lower and significantly slower than vLLM, thus benchmarking using vLLM is recommended.

注意：我们发现使用Huggingface生成质量略差于vLLM，因此推荐使用vLLM进行测试。

Limitations 局限性

Due to limitations in model size, the model may experience hallucinatory issues. As DPO model tend to generate longer response, hallucinations are more likely to occur. We will also continue to iterate and improve the MiniCPM model.
To ensure the universality of the model for academic research purposes, we did not conduct any identity training on the model. Meanwhile, as we use ShareGPT open-source corpus as part of the training data, the model may output identity information similar to the GPT series models.
Due to the limitation of model size, the output of the model is greatly influenced by prompt words, which may result in inconsistent results from multiple attempts.
Due to limited model capacity, the model's knowledge memory is not accurate. In the future, we will combine the RAG method to enhance the model's knowledge memory ability.
受限于模型规模，模型可能出现幻觉性问题。其中由于DPO模型生成的回复内容更长，更容易出现幻觉。我们也将持续进行MiniCPM模型的迭代改进；
为了保证在学术研究用途上模型的通用性，我们未对模型进行任何身份认同训练。同时由于我们用ShareGPT开源语料作为部分训练数据，模型可能会输出类似GPT系列模型的身份认同信息；
受限于模型规模，模型的输出受到提示词（prompt）的影响较大，可能多次尝试产生不一致的结果；
受限于模型容量，模型的知识记忆较不准确，后续我们将结合RAG方法来增强模型的知识记忆能力。

Usage 模型使用

Run the following code after install transformers>=4.36.0 and accelerate
Warning: It is necessary to specify the data type of the model clearly in 'from_pretrained', otherwise large calculation errors will be caused
安装transformers>=4.36.0以及accelerate后，运行以下代码
注意：需要在from_pretrained中明确指明模型的数据类型，否则会引起较大计算误差

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)

path = 'openbmb/MiniCPM-2B-128k'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)

responds, history = model.chat(tokenizer, "山东省最高的山是哪座山, 它比黄山高还是矮？差距多少？", temperature=0.8, top_p=0.8)
print(responds)