RuntimeError: Unexpected error from cudaGetDeviceCount

0. 引言
1. 临时解决方法

0. 引言

使用 vllm-0.4.2 部署时，多卡正常运行。升级到 vllm-0.5.1 时，报错如下：

(VllmWorkerProcess pid=30692) WARNING 07-12 08:16:22 utils.py:562] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
(VllmWorkerProcess pid=30693) WARNING 07-12 08:16:22 utils.py:562] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
(VllmWorkerProcess pid=30694) WARNING 07-12 08:16:22 utils.py:562] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
WARNING 07-12 08:16:22 utils.py:562] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
(VllmWorkerProcess pid=30693) Process VllmWorkerProcess:
(VllmWorkerProcess pid=30693) Traceback (most recent call last):
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(VllmWorkerProcess pid=30693)     self.run()
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 108, in run
(VllmWorkerProcess pid=30693)     self._target(*self._args, **self._kwargs)
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 210, in _run_worker_process
(VllmWorkerProcess pid=30693)     worker = worker_factory()
(VllmWorkerProcess pid=30693)              ^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker
(VllmWorkerProcess pid=30693)     wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 334, in init_worker
(VllmWorkerProcess pid=30693)     self.worker = worker_class(*args, **kwargs)
(VllmWorkerProcess pid=30693)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker.py", line 85, in __init__
(VllmWorkerProcess pid=30693)     self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
(VllmWorkerProcess pid=30693)                                             ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 217, in __init__
(VllmWorkerProcess pid=30693)     self.attn_backend = get_attn_backend(
(VllmWorkerProcess pid=30693)                         ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 45, in get_attn_backend
(VllmWorkerProcess pid=30693)     backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
(VllmWorkerProcess pid=30693)               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 151, in which_attn_to_use
(VllmWorkerProcess pid=30693)     if torch.cuda.get_device_capability()[0] < 8:
(VllmWorkerProcess pid=30693)        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
(VllmWorkerProcess pid=30693)     prop = get_device_properties(device)
(VllmWorkerProcess pid=30693)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 444, in get_device_properties
(VllmWorkerProcess pid=30693)     _lazy_init()  # will define _get_device_properties
(VllmWorkerProcess pid=30693)     ^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
(VllmWorkerProcess pid=30693)     torch._C._cuda_init()
(VllmWorkerProcess pid=30693) RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 2: out of memory
(VllmWorkerProcess pid=30692) Process VllmWorkerProcess:
(VllmWorkerProcess pid=30692) Traceback (most recent call last):
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(VllmWorkerProcess pid=30692)     self.run()
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 108, in run
(VllmWorkerProcess pid=30692)     self._target(*self._args, **self._kwargs)
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 210, in _run_worker_process
(VllmWorkerProcess pid=30692)     worker = worker_factory()
(VllmWorkerProcess pid=30692)              ^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker
(VllmWorkerProcess pid=30692)     wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 334, in init_worker
(VllmWorkerProcess pid=30692)     self.worker = worker_class(*args, **kwargs)
(VllmWorkerProcess pid=30692)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker.py", line 85, in __init__
(VllmWorkerProcess pid=30692)     self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
(VllmWorkerProcess pid=30692)                                             ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 217, in __init__
(VllmWorkerProcess pid=30692)     self.attn_backend = get_attn_backend(
(VllmWorkerProcess pid=30692)                         ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 45, in get_attn_backend
(VllmWorkerProcess pid=30692)     backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
(VllmWorkerProcess pid=30692)               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 151, in which_attn_to_use
(VllmWorkerProcess pid=30692)     if torch.cuda.get_device_capability()[0] < 8:
(VllmWorkerProcess pid=30692)        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
(VllmWorkerProcess pid=30692)     prop = get_device_properties(device)
(VllmWorkerProcess pid=30692)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 444, in get_device_properties
(VllmWorkerProcess pid=30692)     _lazy_init()  # will define _get_device_properties
(VllmWorkerProcess pid=30692)     ^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
(VllmWorkerProcess pid=30692)     torch._C._cuda_init()
(VllmWorkerProcess pid=30692) RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 2: out of memory
(VllmWorkerProcess pid=30694) Process VllmWorkerProcess:
(VllmWorkerProcess pid=30694) Traceback (most recent call last):
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(VllmWorkerProcess pid=30694)     self.run()
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 108, in run
(VllmWorkerProcess pid=30694)     self._target(*self._args, **self._kwargs)
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 210, in _run_worker_process
(VllmWorkerProcess pid=30694)     worker = worker_factory()
(VllmWorkerProcess pid=30694)              ^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker
(VllmWorkerProcess pid=30694)     wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 334, in init_worker
(VllmWorkerProcess pid=30694)     self.worker = worker_class(*args, **kwargs)
(VllmWorkerProcess pid=30694)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker.py", line 85, in __init__
(VllmWorkerProcess pid=30694)     self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
(VllmWorkerProcess pid=30694)                                             ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 217, in __init__
(VllmWorkerProcess pid=30694)     self.attn_backend = get_attn_backend(
(VllmWorkerProcess pid=30694)                         ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 45, in get_attn_backend
(VllmWorkerProcess pid=30694)     backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
(VllmWorkerProcess pid=30694)               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 151, in which_attn_to_use
(VllmWorkerProcess pid=30694)     if torch.cuda.get_device_capability()[0] < 8:
(VllmWorkerProcess pid=30694)        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
(VllmWorkerProcess pid=30694)     prop = get_device_properties(device)
(VllmWorkerProcess pid=30694)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 444, in get_device_properties
(VllmWorkerProcess pid=30694)     _lazy_init()  # will define _get_device_properties
(VllmWorkerProcess pid=30694)     ^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
(VllmWorkerProcess pid=30694)     torch._C._cuda_init()
(VllmWorkerProcess pid=30694) RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 2: out of memory
ERROR 07-12 08:16:26 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 30693 died, exit code: 1
INFO 07-12 08:16:26 multiproc_worker_utils.py:123] Killing local vLLM worker processes

1. 临时解决方法

vi /root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py--- 设置成固定的 `backend = _Backend.XFORMERS`。# backend = which_attn_to_use(num_heads, head_size, num_kv_heads,#                           sliding_window, dtype, kv_cache_dtype,#                            block_size)backend = _Backend.XFORMERS
---

完结！

RuntimeError: Unexpected error from cudaGetDeviceCount

RuntimeError: Unexpected error from cudaGetDeviceCount

0. 引言

1. 临时解决方法

相关资讯

热文排行

最新新闻

推荐新闻

热搜词