快速上手大模型的对话生成

本项目使用0.5B小模型，结构和大模型别无二致，以方便在如CPU设备上快速学习和上手大模型的对话上传

1. 加载模型

使用了 transformers 库来加载一个预训练的语言模型和对应的分词器：

使用 AutoModelForCausalLM.from_pretrained 方法加载预训练的语言模型，自动选择合适的设备和数据类型。
使用 AutoTokenizer.from_pretrained 方法加载与模型对应的分词器。

from transformers import AutoModelForCausalLM, AutoTokenizermodel_name = "Qwen/Qwen2.5-0.5B-Instruct"model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype="auto",device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

2. 加载分词器

这段代码用于生成对话，定义了一个提示和消息列表：

prompt = "Give me a short introduction to large language model."
messages = [{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},{"role": "user", "content": prompt}
]

上面这种格式，是当前许多大型语言模型（LLM）采用的对话式交互格式。具体来说，这种格式通常包含多个消息，每条消息都有一个 role（角色）和 content（内容）
大型科技公司（如OpenAI、Google、Meta等）推出的对话式API普遍采用这种消息列表的格式

prompt 是用户输入的提示。
messages 是一个包含对话历史的列表，每个元素是一个字典，包含两个字段：
- role：消息的角色，可以是 system 或 user。
  - system（系统）：用于设置对话的基调、规则或角色。例如，定义助手的身份、行为准则等。
  - user（用户）：表示用户输入的问题或请求。
  - assistant（助手）：表示模型生成的回复内容。
- content：消息的内容。

text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True
)
# 输出：
# <|im_start|>system
# You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
# <|im_start|>user
# 请介绍一下iPhone<|im_end|>
# <|im_start|>assistant

tokenizer.apply_chat_template 方法的主要功能是将结构化的 messages 列表转换为模型可以理解和处理的特定文本格式
模型可以理解处理的格式使用了特殊的控制标记（如 <|im_start|> 和 <|im_end|>）

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# 输出：
# {'input_ids': tensor([[151644,   8948,    198,   2610,    525,   1207,  16948,     11,   3465,
#             553,  54364,  14817,     13,   1446,    525,    264,  10950,  17847,
#              13, 151645,    198, 151644,    872,    198,  35127,    752,    264,
#            2805,  16800,    311,   3460,   4128,   1614,     13, 151645,    198,
#          151644,  77091,    198]], device='mps:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
#          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], device='mps:0')}

tokenizer 对文本进行编码，通常会将文本分割成单词或子词单元
[text]: tokenizer 通常期望输入是一个批次的文本（即使只有一个文本，也需要放入列表中）
return_tensors="pt": 返回 PyTorch 张量
model_inputs 是模型的输入，通常是一个字典，包含 input_ids 和 attention_mask 两个字段
- input_ids: 一个整数张量，表示文本被分词后对应的 token ID 序列
- attention_mask: 01表示，用于指示哪些 token 是实际文本，哪些是填充部分（padding）

3. 生成对话

generated_ids = model.generate(**model_inputs,max_new_tokens=512
)
# print(generated_ids)
# 输出：
# tensor([[151644,   8948,    198,   2610,    525,   1207,  16948,     11,   3465,
#             553,  54364,  14817,     13,   1446,    525,    264,  10950,  17847,
#              13, 151645,    198, 151644,    872,    198,  35127,    752,    264,
#            2805,  16800,    311,   3460,   4128,   1614,     13, 151645,    198,
#          151644,  77091,    198,  34253,   4128,   4119,    320,   4086,  21634,
#               8,    525,  20443,  11229,   5942,    429,    646,   6923,   3738,
#           12681,   1467,    389,    862,   1828,     13,   4220,   4119,    525,
#            6188,    311,  55359,    279,  23094,    323,  24177,    315,   5810,
#            4128,   8692,     11,  10693,   1105,    311,   3535,     11,  14198,
#              11,    323,   6923,   3738,   4128,    304,   5257,  37597,     13,
#             444,  10994,     82,    614,   1012,   6839,    311,    387,   7373,
#             304,    264,   6884,   2088,    315,   8357,     11,   2670,   5662,
#           14468,     11,   6236,  61905,     11,  28285,   2022,     11,    323,
#            3405,     12,    596,     86,   4671,   9079,     13,   2379,    646,
#            1882,  12767,  14713,    315,    821,   6157,    323,  29720,     11,
#            3259,   1105,  14452,    369,    264,   8045,    315,   1931,  30084,
#             990,   5048,   1380,   1467,   9471,    374,   2567,     13, 151645]],
#        device='mps:0')

model.generate: 生成新的文本，根据输入的 model_inputs，逐步预测下一个 token，直到达到指定的生成长度或遇到停止条件
**model_inputs: 这是 Python 的语法，表示将字典 model_inputs 解包为关键字参数。例如，如果 model_inputs = {'input_ids': tensor, 'attention_mask': tensor}，那么 **model_inputs 相当于 input_ids=tensor, attention_mask=tensor

generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
print(generated_ids)
# [tensor([   315,  22870,    323,   3410,  14507,    429,  15148,  55359,   3738,
#           8806,  12624,    382,   9485,   4119,    525,   6188,    311,    387,
#          31945,    323,  93748,     11,   2952,    311,   3705,    264,   6884,
#           2088,    315,   9079,   2670,   1467,   9471,     11,  28285,   2022,
#             11,   3405,     12,    596,     86,   4671,     11,    323,   1496,
#           4378,     13,   2379,    614,   1012,   1483,    304,   5257,   8357,
#           1741,    438,   4108,  56519,     11,   6236,  61905,     11,    323,
#           4128,  14468,   5942,    382,   3966,    315,    279,   1376,   4419,
#            315,    444,  10994,     82,    374,    862,   5726,    311,   3960,
#            504,   3139,    916,    882,     11,    892,   6147,   1105,    311,
#           7269,    862,   5068,    448,  11504,  14338,    311,    501,    821,
#             13,   1096,   3643,   1105,   7945,   5390,    369,   8357,   1380,
#          13403,    323,  40861,    525,   9023,     11,   1741,    438,    304,
#           6002,   2473,    476,   6457,  22982,    382,  27489,     11,   3460,
#           4128,   4119,   4009,    264,   5089,  49825,    304,  15235,   5440,
#             11,  10004,   7988,   7375,    369,  23163,   3738,  12681,  14507,
#            323,   5006,   1095,   6351,   9079,   1526,   5662,   6832,  25185,
#             13, 151645])]# 相当于：
# 初始化一个空列表，用于存储结果
generated_ids_only = []# 遍历每一对 (input_ids, output_ids)
for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids):# 去掉输入部分，只保留生成部分generated_part = output_ids[len(input_ids):]# 将结果添加到列表中generated_ids_only.append(generated_part)# 最终结果
generated_ids = generated_ids_only
print(generated_ids_only)

这行代码的作用是从模型生成的完整 token ID 序列中，去掉输入部分，只保留生成部分
这么复杂的写法是为了处理批量数据（即多个输入和生成序列）

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

将 token ID 序列解码为可读的文本。
skip_special_tokens=True 表示跳过特殊 token（如 <|im_start|>、<|im_end|> 等），只保留实际文本。
[0]: 因为 batch_decode 返回的是一个列表（即使只有一个序列），所以需要通过 [0] 取出第一个元素。

如果不去掉输入部分，直接解码会得到：

# 如果不经过这步的处理直接返回是：
# system
# You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
# user
# Give me a short introduction to large language model.
# assistant
# A large language model (LLM) is a type of artificial intelligence that can produce human-like text based on input data. These models use massive amounts of raw text and other types of knowledge to generate coherent and natural-sounding output. LLMs are used in a wide range of applications such as chatbots, virtual assistants, machine translation, and more. They have the ability to learn from vast amounts of data and improve their performance over time.

参考

【通义千问2.0】微调之SFT训练 https://www.bilibili.com/video/BV1JLt2e4EKj
QwenLM/Qwen2.5-README.md https://github.com/QwenLM/Qwen2.5/blob/a7b515534d739f6ebb66c5fe2595862ad7118edb/README.md
Qwen/Qwen2.5-0.5B-Instruct https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct

快速上手大模型的对话生成

1. 加载模型

2. 加载分词器

3. 生成对话

参考

相关资讯

热文排行

最新新闻

推荐新闻

热搜词