时间 | 版本 | 修改人 | 描述 |
---|---|---|---|
2024年8月12日16:58:30 | V0.1 | 宋全恒 | 新建文档 |
简介
本文档主要演示搭建vllm0.5.0的评测环境的过程。这个环境问题,真的很费劲。
使用的镜像
(lmdeploy042) yuzailiang@ubuntu:~$ docker run --name vllm050 --gpus all -v /mnt/self-define/:/mnt/self-define -it 10.101.12.128/schen-zhejianglab.com/vllmcusparselt:1.0-dev-nvidia12.4-cudnn8-jupyter-ssh
注: 挂载共享目录,是为了方便,在共享目录中,可以有一些配置信息,自己常用的,进行保存。如缓存目录。
注: --gpus all 则是为了使用GPU。
这样,在环境构建过程中,就不用每次下载同样的一个而包,花费很长的等待时间了。如下述的torch包,779MB,可以保证下载一次之后,之后就可以一直使用缓存了。
Collecting torch==2.3.0Downloading https://pypi.tuna.tsinghua.edu.cn/packages/43/e5/2ddae60ae999b224aceb74490abeb885ee118227f866cb12046f0481d4c9/torch-2.3.0-cp310-cp310-manylinux1_x86_64.whl (779.1 MB)
同事说,要先提供配置:
export TORCH_CUDA_ARCH_LIST="8.0 8.6 8.9 9.0"
下载源码
root@74d4cc1d5091:/workspace# git clone https://github.com/yanchenmochen/vllm.git
Cloning into 'vllm'...
remote: Enumerating objects: 27420, done.
remote: Counting objects: 100% (9923/9923), done.
remote: Compressing objects: 100% (1072/1072), done.
remote: Total 27420 (delta 9392), reused 8851 (delta 8851), pack-reused 17497
Receiving objects: 100% (27420/27420), 23.84 MiB | 783.00 KiB/s, done.
Resolving deltas: 100% (20780/20780), done.root@74d4cc1d5091:/workspace# cd vllm/
root@74d4cc1d5091:/workspace/vllm# git checkout v0.5.0
Note: switching to 'v0.5.0'.
vllm v0.5.0执行编译安装
为了查看源码编译安装的详细过程,因此使用了如下命令
root@74d4cc1d5091:/workspace/vllm# pip install -e . --verbose -i https://pypi.tuna.tsinghua.edu.cn/simple --cache-dir /mnt/self-define/songquanheng/pip_dir/cache/
Using pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10)
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///workspace/vllmRunning command pip subprocess to install build dependenciesLooking in indexes: https://pypi.tuna.tsinghua.edu.cn/simpleCollecting cmake>=3.21Downloading https://pypi.tuna.tsinghua.edu.cn/packages/69/70/242937601f9ff9e6df4c0587b5a7702be4dbfd33420b409d80e2bccc276a/cmake-3.30.2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.9 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26.9/26.9 MB 3.5 MB/s eta 0:00:00Collecting ninjaDownloading https://pypi.tuna.tsinghua.edu.cn/packages/6d/92/8d7aebd4430ab5ff65df2bfee6d5745f95c004284db2d8ca76dcbfd9de47/ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.2/307.2 KB 2.7 MB/s eta 0:00:00
这样可以得到下载缓存。
–verbose,参数为了更详细的打印安装的执行过程。
-i https://pypi.tuna.tsinghua.edu.cn/simple 使用清华源 加速构建。
–cache-dir /mnt/self-define/songquanheng/pip_dir/cache/ 方便下次构建,这样能够将下载缓存起来,提升下载效率。
问题和解决方式
源码安装lm-eval
安装失败
root@74d4cc1d5091:/mnt/self-define/songquanheng/lm-evaluation-harness# pip install -e .
Obtaining file:///mnt/self-define/songquanheng/lm-evaluation-harnessInstalling build dependencies ... doneChecking if build backend supports build_editable ... doneGetting requirements to build wheel ... donePreparing metadata (pyproject.toml) ... done
Installing collected packages: UNKNOWNRunning setup.py develop for UNKNOWN
Successfully installed UNKNOWN-0.0.0
解决方式如下:
root@74d4cc1d5091:/mnt/self-define/songquanheng/lm-evaluation-harness# python -m pip install --upgrade pip
Requirement already satisfied: pip in /usr/lib/python3/dist-packages (22.0.2)
Collecting pipDownloading pip-24.2-py3-none-any.whl (1.8 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 4.4 MB/s eta 0:00:00
Installing collected packages: pipAttempting uninstall: pipFound existing installation: pip 22.0.2Not uninstalling pip at /usr/lib/python3/dist-packages, outside environment /usrCan't uninstall 'pip'. No files were found to uninstall.
Successfully installed pip-24.2root@74d4cc1d5091:/mnt/self-define/songquanheng/lm-evaluation-harness# pip install setuptools --upgrade
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (72.1.0)
再次安装,出现下述基本成功
Obtaining file:///mnt/self-define/songquanheng/lm-evaluation-harnessInstalling build dependencies ... doneChecking if build backend supports build_editable ... doneGetting requirements to build editable ... donePreparing editable metadata (pyproject.toml) ... done
Requirement already satisfied: accelerate>=0.26.0 in /usr/local/lib/python3.10/dist-packages (from lm_eval==0.4.3) (0.28.0)
Collecting evaluate (from lm_eval==0.4.3)Downloading evaluate-0.4.2-py3-none-any.whl.metadata (9.3 kB)
Requirement already satisfied: datasets>=2.16.0 in /usr/local/lib/python3.10/dist-packages (from lm_eval==0.4.3) (2.18.0)
Collecting jsonlines (from lm_eval==0.4.3)Downloading jsonlines-4.0.0-py3-none-any.whl.metadata (1.6 kB)
Collecting numexpr (from lm_eval==0.4.3)Downloading numexpr-2.10.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.2 kB)
Collecting peft>=0.2.0 (from lm_eval==0.
...Successfully built lm_eval rouge-score sqlitedict word2number
Installing collected packages: word2number, sqlitedict, zstandard, threadpoolctl, tcolorpy, tabulate, pybind11, portalocker, pathvalidate, numexpr, nltk, more-itertools, lxml, jsonlines, colorama, chardet, absl-py, tqdm-multiprocess, scikit-learn, sacrebleu, rouge-score, mbstrdecoder, typepy, peft, evaluate, DataProperty, tabledata, pytablewriter, lm_eval
Successfully installed DataProperty-1.0.1 absl-py-2.1.0 chardet-5.2.0 colorama-0.4.6 evaluate-0.4.2 jsonlines-4.0.0 lm_eval-0.4.3 lxml-5.3.0 mbstrdecoder-1.1.3 more-itertools-10.4.0 nltk-3.8.2 numexpr-2.10.1 pathvalidate-3.2.0 peft-0.12.0 portalocker-2.10.1 pybind11-2.13.1 pytablewriter-1.2.0 rouge-score-0.1.2 sacrebleu-2.4.2 scikit-learn-1.5.1 sqlitedict-2.1.0 tabledata-1.3.3 tabulate-0.9.0 tcolorpy-0.1.6 threadpoolctl-3.5.0 tqdm-multiprocess-0.0.11 typepy-1.3.2 word2number-1.1 zstandard-0.23.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
问题与解决方式
root@74d4cc1d5091:/mnt/self-define/songquanheng/lm-evaluation-harness# lm-eval --tasks list
Traceback (most recent call last):File "/usr/local/bin/lm-eval", line 5, in <module>from lm_eval.__main__ import cli_evaluateFile "/mnt/self-define/songquanheng/lm-evaluation-harness/lm_eval/__init__.py", line 1, in <module>from .evaluator import evaluate, simple_evaluateFile "/mnt/self-define/songquanheng/lm-evaluation-harness/lm_eval/evaluator.py", line 12, in <module>import lm_eval.api.metricsFile "/mnt/self-define/songquanheng/lm-evaluation-harness/lm_eval/api/metrics.py", line 12, in <module>from lm_eval.api.registry import register_aggregation, register_metricFile "/mnt/self-define/songquanheng/lm-evaluation-harness/lm_eval/api/registry.py", line 4, in <module>import evaluate as hf_evaluateFile "/usr/local/lib/python3.10/dist-packages/evaluate/__init__.py", line 29, in <module>from .evaluation_suite import EvaluationSuiteFile "/usr/local/lib/python3.10/dist-packages/evaluate/evaluation_suite/__init__.py", line 10, in <module>from ..evaluator import evaluatorFile "/usr/local/lib/python3.10/dist-packages/evaluate/evaluator/__init__.py", line 17, in <module>from transformers.pipelines import SUPPORTED_TASKS as SUPPORTED_PIPELINE_TASKSFile "/usr/local/lib/python3.10/dist-packages/transformers/pipelines/__init__.py", line 26, in <module>from ..image_processing_utils import BaseImageProcessorFile "/usr/local/lib/python3.10/dist-packages/transformers/image_processing_utils.py", line 21, in <module>from .image_transforms import center_crop, normalize, rescaleFile "/usr/local/lib/python3.10/dist-packages/transformers/image_transforms.py", line 22, in <module>from .image_utils import (File "/usr/local/lib/python3.10/dist-packages/transformers/image_utils.py", line 58, in <module>from torchvision.transforms import InterpolationModeFile "/usr/local/lib/python3.10/dist-packages/torchvision/__init__.py", line 6, in <module>from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utilsFile "/usr/local/lib/python3.10/dist-packages/torchvision/_meta_registrations.py", line 164, in <module>def meta_nms(dets, scores, iou_threshold):File "/usr/local/lib/python3.10/dist-packages/torch/library.py", line 467, in innerhandle = entry.abstract_impl.register(func_to_register, source)File "/usr/local/lib/python3.10/dist-packages/torch/_library/abstract_impl.py", line 30, in registerif torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
RuntimeError: operator torchvision::nms does not exist
解决方式参考RuntimeError: operator torchvision::nms does not exist - vision - PyTorch Forums
功能验证
root@74d4cc1d5091:/mnt/self-define/songquanheng/lm-evaluation-harness# lm-eval --tasks list| Group | Config Location |
|---------------------------------|------------------------------------------------------------------------|
|aclue |lm_eval/tasks/aclue/_aclue.yaml |
|aexams |lm_eval/tasks/aexams/_aexams.yaml |
|agieval |lm_eval/tasks/agieval/agieval.yaml |
|agieval_cn |lm_eval/tasks/agieval/agieval_cn.yaml |
|agieval_en |lm_eval/tasks/agieval/agieval_en.yaml |
|agieval_nous |lm_eval/tasks/agieval/agieval_nous.yaml |
|arabicmmlu |lm_eval/tasks/arabicmmlu/_arabicmmlu.yaml |
|arabicmmlu_humanities |lm_eval/tasks/arabicmmlu/_arabicmmlu_humanities.yaml |
|arabicmmlu_language |lm_eval/tasks/arabicmmlu/_arabicmmlu_language.yaml |
|arabicmmlu_other |lm_eval/tasks/arabicmmlu/_arabicmmlu_other.yaml |
|arabicmmlu_social_science |lm_eval/tasks/arabicmmlu/_arabicmmlu_social_science.yaml |
...
这样lm-eval基本算验证成功
验证模型测试精度
root@74d4cc1d5091:/mnt/self-define/songquanheng/lm-evaluation-harness# lm-eval --model vllm --model_args pretrained=/mnt/self-define/zhangweixing/model/llama2-7b-hf,gpu_memory_utilization=0.8 --tasks arc_easy --device cuda:0
INFO 08-12 12:25:34 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='/mnt/self-define/zhangweixing/model/llama2-7b-hf', speculative_config=None, tokenizer='/mnt/self-define/zhangweixing/model/llama2-7b-hf', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=1234, served_model_name=/mnt/self-define/zhangweixing/model/llama2-7b-hf)
INFO 08-12 12:27:41 model_runner.py:159] Loading model weights took 12.5523 GB
INFO 08-12 12:27:42 gpu_executor.py:83] # GPU blocks: 2345, # CPU blocks: 512
INFO 08-12 12:27:44 model_runner.py:878] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 08-12 12:27:44 model_runner.py:882] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 08-12 12:27:57 model_runner.py:954] Graph capturing finished in 13 secs.
Downloading readme: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.00k/9.00k [00:00<00:00, 15.9MB/s]Generating test split: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2376/2376 [00:00<00:00, 290324.14 examples/s]
Generating validation split: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 570/570 [00:00<00:00, 182361.04 examples/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2376/2376 [00:01<00:00, 1406.79it/s]
Running loglikelihood requests: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9501/9501 [02:15<00:00, 70.26it/s]
fatal: detected dubious ownership in repository at '/mnt/self-define/songquanheng/lm-evaluation-harness'
To add an exception for this directory, call:git config --global --add safe.directory /mnt/self-define/songquanheng/lm-evaluation-harness
vllm (pretrained=/mnt/self-define/zhangweixing/model/llama2-7b-hf,gpu_memory_utilization=0.8), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|--------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_easy| 1|none | 0|acc |↑ |0.7630|± |0.0087|
| | |none | 0|acc_norm|↑ |0.7458|± |0.0089|
至此,就完成了vllm 0.5.0和lm-eval评测环境镜像的搭建了,然后,我们可以基于源码开发镜像,并使用lm-eval来评测量化模型和原始模型的精度,使用vllm 原始的benchmark可以测试首token和推理延时。
镜像备份harbor
(lmdeploy042) yuzailiang@ubuntu:~$ docker push 10.200.88.53/framework/vllm:0.5.0-lm_eval-0.4.3-ssh
The push refers to repository [10.200.88.53/framework/vllm]
679b7252e87c: Pushing [> ] 53.92MB/7.034GB
50bbce084879: Pushing [> ] 42.99MB/30.22GB
47c82924cf56: Pushed
ebf20e4fb8d4: Pushing [===> ] 40.63MB/584.8MB
8b9803501a26: Pushing [======> ] 40.87MB/303.9MB
eb4697a44dd2: Pushing [====> ] 35.02MB/404.3MB
b7ad7c045853: Waiting
816e34807296: Waiting
09e47d21a1ca: Waiting
594f9ac14b13: Waiting
600c676771a0: Waiting
6ac15100dff6: Waiting
40f0eb1871b9: Waiting
8d113b7b997c: Waiting
cd77f58b80cd: Waiting
e4b1bddcbe63: Waiting
765423415d69: Waiting
7b9433fba79b: Waiting
256d88da4185: Waiting
镜像要经常备份一下,省的工作成果丢失了。
容器启动挂载端口和存储
docker run -d --name smoothquant --gpus all -v /mnt/self-define:/mnt/self-define -p 8022:22 -it 10.200.88.53/framework/vllm:0.5.0-lm_eval-0.4.3-ssh(base) yuzailiang@ubuntu:~$ docker run -d --name vllm-smoothquant --gpus all -v /mnt/self-define:/mnt/self-define -p 38022:22 -it 10.200.88.53/framework/vllm:0.5.0-lm_eval-0.4.3-ssh
e483f52b9cb0942540b1ff205688e8b6588aa723eae494efe660f25c7846d88a
总结
经过上述的镜像生成,之后,我们之后就可以一直使用源码安装的方式来进行环境的创建,生成镜像,复用镜像,并且在使用过程中,也演示了pip使用缓存的技巧,这是非常方便,非常有效的一种方式。
最后,总结一下,这个文章的主要内容:
- 源码安装vllm 0.5.0
- 源码安装lm-evaluate-harness 用来评测大模型和量化大模型的精度。
搭建容器sshd环境
可参考文档 印象笔记之07-09 周二 镜像启动容器添加openssh,使用vscode断点调试Python工程, 容器配置源码安装vllm
root@74d4cc1d5091:/mnt/self-define/songquanheng# cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
说明是Ubuntu 22.04 Jammy版本
阿里源为
deb http://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse
备份 /etc/apt/sources.list.d
root@74d4cc1d5091:/etc/apt/sources.list.d# ls | xargs -I {} mv {} {}.bak
更新和下载必要的ssh环境
apt-get update
apt-get install openssh-server
修改端口和允许root登录
echo "Port 22" >> /etc/ssh/sshd_config
echo "PermitRootLogin yes" >> /etc/ssh/sshd_config
更新root密码为tianshu@123
root@a9a7cd77c4ee:~# passwd
New password:
Retype new password:
passwd: password updated successfully
启动ssh容器
/etc/init.d/ssh restart
配置vscode免密登录
参见 07-16 VSCode配置 SSH连接远程服务器+免密连接教程
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDUZN3Oh46GlQJlG8FGxYWhl9Xvj3Y0gJ2twSpIUA9ukpXySWpVjQ8am3NZjt1lKL5qVFcRobn8hpPwwZ5coFSN8qon228f85eIWCRSMRvqFpoHfLzC5qHG6hwdq0LXKLfj68q5xNKnSZ3MnB7wA4nTBz1bA5vcq//be3nrGzW5DMl8miwmAvJ0P4xasPPB2iePe6Y2DEHtSgTD3yMGTefq1IzaeZaVEGsrSI8J57vzhqFjOpAnwcPFGwXq/RAESchUX/WHJ498bRijDLCrvYPNQlIzwjx8C74Tj6w/cp8QO2sSRVtuKRf3cuHyB7B69+mUYzrgGHqi7JBGuGSNlMCZ zj@DESKTOP-L6VJN12
执行如下命令
root@74d4cc1d5091:/etc/apt/sources.list.d# mkdir ~/.ssh
root@74d4cc1d5091:/etc/apt/sources.list.d# vim ~/.ssh/authorized_keys
总结
该文档详细描述了通过Docker镜像构建vllm框架的整体过程,并且输出成果物
10.200.88.53/framework/vllm:0.5.0-lm_eval-0.4.3-ssh
该镜像基于vllm 0.5.0,并且合入了lm_eval版本0.4.3,两个框架都是源码编译安装的,均位于/workspace
,如果基于vllm上再次进行精度和推理的测试,可以基于该镜像进行工作。
另外,就是,该文档也描述了容器配饰sshd的环境和vscode免密登录的过程,方便之后使用vscode直接连接到容器进行开发。