RuntimeError: Unexpected error from cudaGetDeviceCount
- 0. 引言
- 1. 临时解决方法
0. 引言
使用 vllm-0.4.2 部署时,多卡正常运行。升级到 vllm-0.5.1 时,报错如下:
(VllmWorkerProcess pid=30692) WARNING 07-12 08:16:22 utils.py:562] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
(VllmWorkerProcess pid=30693) WARNING 07-12 08:16:22 utils.py:562] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
(VllmWorkerProcess pid=30694) WARNING 07-12 08:16:22 utils.py:562] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
WARNING 07-12 08:16:22 utils.py:562] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
(VllmWorkerProcess pid=30693) Process VllmWorkerProcess:
(VllmWorkerProcess pid=30693) Traceback (most recent call last):
(VllmWorkerProcess pid=30693) File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(VllmWorkerProcess pid=30693) self.run()
(VllmWorkerProcess pid=30693) File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 108, in run
(VllmWorkerProcess pid=30693) self._target(*self._args, **self._kwargs)
(VllmWorkerProcess pid=30693) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 210, in _run_worker_process
(VllmWorkerProcess pid=30693) worker = worker_factory()
(VllmWorkerProcess pid=30693) ^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker
(VllmWorkerProcess pid=30693) wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
(VllmWorkerProcess pid=30693) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 334, in init_worker
(VllmWorkerProcess pid=30693) self.worker = worker_class(*args, **kwargs)
(VllmWorkerProcess pid=30693) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker.py", line 85, in __init__
(VllmWorkerProcess pid=30693) self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
(VllmWorkerProcess pid=30693) ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 217, in __init__
(VllmWorkerProcess pid=30693) self.attn_backend = get_attn_backend(
(VllmWorkerProcess pid=30693) ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 45, in get_attn_backend
(VllmWorkerProcess pid=30693) backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
(VllmWorkerProcess pid=30693) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 151, in which_attn_to_use
(VllmWorkerProcess pid=30693) if torch.cuda.get_device_capability()[0] < 8:
(VllmWorkerProcess pid=30693) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
(VllmWorkerProcess pid=30693) prop = get_device_properties(device)
(VllmWorkerProcess pid=30693) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 444, in get_device_properties
(VllmWorkerProcess pid=30693) _lazy_init() # will define _get_device_properties
(VllmWorkerProcess pid=30693) ^^^^^^^^^^^^
(VllmWorkerProcess pid=30693) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
(VllmWorkerProcess pid=30693) torch._C._cuda_init()
(VllmWorkerProcess pid=30693) RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 2: out of memory
(VllmWorkerProcess pid=30692) Process VllmWorkerProcess:
(VllmWorkerProcess pid=30692) Traceback (most recent call last):
(VllmWorkerProcess pid=30692) File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(VllmWorkerProcess pid=30692) self.run()
(VllmWorkerProcess pid=30692) File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 108, in run
(VllmWorkerProcess pid=30692) self._target(*self._args, **self._kwargs)
(VllmWorkerProcess pid=30692) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 210, in _run_worker_process
(VllmWorkerProcess pid=30692) worker = worker_factory()
(VllmWorkerProcess pid=30692) ^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker
(VllmWorkerProcess pid=30692) wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
(VllmWorkerProcess pid=30692) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 334, in init_worker
(VllmWorkerProcess pid=30692) self.worker = worker_class(*args, **kwargs)
(VllmWorkerProcess pid=30692) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker.py", line 85, in __init__
(VllmWorkerProcess pid=30692) self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
(VllmWorkerProcess pid=30692) ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 217, in __init__
(VllmWorkerProcess pid=30692) self.attn_backend = get_attn_backend(
(VllmWorkerProcess pid=30692) ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 45, in get_attn_backend
(VllmWorkerProcess pid=30692) backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
(VllmWorkerProcess pid=30692) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 151, in which_attn_to_use
(VllmWorkerProcess pid=30692) if torch.cuda.get_device_capability()[0] < 8:
(VllmWorkerProcess pid=30692) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
(VllmWorkerProcess pid=30692) prop = get_device_properties(device)
(VllmWorkerProcess pid=30692) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 444, in get_device_properties
(VllmWorkerProcess pid=30692) _lazy_init() # will define _get_device_properties
(VllmWorkerProcess pid=30692) ^^^^^^^^^^^^
(VllmWorkerProcess pid=30692) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
(VllmWorkerProcess pid=30692) torch._C._cuda_init()
(VllmWorkerProcess pid=30692) RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 2: out of memory
(VllmWorkerProcess pid=30694) Process VllmWorkerProcess:
(VllmWorkerProcess pid=30694) Traceback (most recent call last):
(VllmWorkerProcess pid=30694) File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(VllmWorkerProcess pid=30694) self.run()
(VllmWorkerProcess pid=30694) File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 108, in run
(VllmWorkerProcess pid=30694) self._target(*self._args, **self._kwargs)
(VllmWorkerProcess pid=30694) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 210, in _run_worker_process
(VllmWorkerProcess pid=30694) worker = worker_factory()
(VllmWorkerProcess pid=30694) ^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker
(VllmWorkerProcess pid=30694) wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
(VllmWorkerProcess pid=30694) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 334, in init_worker
(VllmWorkerProcess pid=30694) self.worker = worker_class(*args, **kwargs)
(VllmWorkerProcess pid=30694) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker.py", line 85, in __init__
(VllmWorkerProcess pid=30694) self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
(VllmWorkerProcess pid=30694) ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 217, in __init__
(VllmWorkerProcess pid=30694) self.attn_backend = get_attn_backend(
(VllmWorkerProcess pid=30694) ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 45, in get_attn_backend
(VllmWorkerProcess pid=30694) backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
(VllmWorkerProcess pid=30694) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 151, in which_attn_to_use
(VllmWorkerProcess pid=30694) if torch.cuda.get_device_capability()[0] < 8:
(VllmWorkerProcess pid=30694) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
(VllmWorkerProcess pid=30694) prop = get_device_properties(device)
(VllmWorkerProcess pid=30694) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 444, in get_device_properties
(VllmWorkerProcess pid=30694) _lazy_init() # will define _get_device_properties
(VllmWorkerProcess pid=30694) ^^^^^^^^^^^^
(VllmWorkerProcess pid=30694) File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
(VllmWorkerProcess pid=30694) torch._C._cuda_init()
(VllmWorkerProcess pid=30694) RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 2: out of memory
ERROR 07-12 08:16:26 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 30693 died, exit code: 1
INFO 07-12 08:16:26 multiproc_worker_utils.py:123] Killing local vLLM worker processes
1. 临时解决方法
vi /root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py--- 设置成固定的 `backend = _Backend.XFORMERS`。# backend = which_attn_to_use(num_heads, head_size, num_kv_heads,# sliding_window, dtype, kv_cache_dtype,# block_size)backend = _Backend.XFORMERS
---
完结!