欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 健康 > 美食 > RuntimeError: Unexpected error from cudaGetDeviceCount

RuntimeError: Unexpected error from cudaGetDeviceCount

2025/2/7 14:34:06 来源:https://blog.csdn.net/engchina/article/details/140369371  浏览:    关键词:RuntimeError: Unexpected error from cudaGetDeviceCount

RuntimeError: Unexpected error from cudaGetDeviceCount

  • 0. 引言
  • 1. 临时解决方法

0. 引言

使用 vllm-0.4.2 部署时,多卡正常运行。升级到 vllm-0.5.1 时,报错如下:

(VllmWorkerProcess pid=30692) WARNING 07-12 08:16:22 utils.py:562] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
(VllmWorkerProcess pid=30693) WARNING 07-12 08:16:22 utils.py:562] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
(VllmWorkerProcess pid=30694) WARNING 07-12 08:16:22 utils.py:562] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
WARNING 07-12 08:16:22 utils.py:562] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
(VllmWorkerProcess pid=30693) Process VllmWorkerProcess:
(VllmWorkerProcess pid=30693) Traceback (most recent call last):
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(VllmWorkerProcess pid=30693)     self.run()
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 108, in run
(VllmWorkerProcess pid=30693)     self._target(*self._args, **self._kwargs)
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 210, in _run_worker_process
(VllmWorkerProcess pid=30693)     worker = worker_factory()
(VllmWorkerProcess pid=30693)              ^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker
(VllmWorkerProcess pid=30693)     wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 334, in init_worker
(VllmWorkerProcess pid=30693)     self.worker = worker_class(*args, **kwargs)
(VllmWorkerProcess pid=30693)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker.py", line 85, in __init__
(VllmWorkerProcess pid=30693)     self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
(VllmWorkerProcess pid=30693)                                             ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 217, in __init__
(VllmWorkerProcess pid=30693)     self.attn_backend = get_attn_backend(
(VllmWorkerProcess pid=30693)                         ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 45, in get_attn_backend
(VllmWorkerProcess pid=30693)     backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
(VllmWorkerProcess pid=30693)               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 151, in which_attn_to_use
(VllmWorkerProcess pid=30693)     if torch.cuda.get_device_capability()[0] < 8:
(VllmWorkerProcess pid=30693)        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
(VllmWorkerProcess pid=30693)     prop = get_device_properties(device)
(VllmWorkerProcess pid=30693)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 444, in get_device_properties
(VllmWorkerProcess pid=30693)     _lazy_init()  # will define _get_device_properties
(VllmWorkerProcess pid=30693)     ^^^^^^^^^^^^
(VllmWorkerProcess pid=30693)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
(VllmWorkerProcess pid=30693)     torch._C._cuda_init()
(VllmWorkerProcess pid=30693) RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 2: out of memory
(VllmWorkerProcess pid=30692) Process VllmWorkerProcess:
(VllmWorkerProcess pid=30692) Traceback (most recent call last):
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(VllmWorkerProcess pid=30692)     self.run()
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 108, in run
(VllmWorkerProcess pid=30692)     self._target(*self._args, **self._kwargs)
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 210, in _run_worker_process
(VllmWorkerProcess pid=30692)     worker = worker_factory()
(VllmWorkerProcess pid=30692)              ^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker
(VllmWorkerProcess pid=30692)     wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 334, in init_worker
(VllmWorkerProcess pid=30692)     self.worker = worker_class(*args, **kwargs)
(VllmWorkerProcess pid=30692)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker.py", line 85, in __init__
(VllmWorkerProcess pid=30692)     self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
(VllmWorkerProcess pid=30692)                                             ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 217, in __init__
(VllmWorkerProcess pid=30692)     self.attn_backend = get_attn_backend(
(VllmWorkerProcess pid=30692)                         ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 45, in get_attn_backend
(VllmWorkerProcess pid=30692)     backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
(VllmWorkerProcess pid=30692)               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 151, in which_attn_to_use
(VllmWorkerProcess pid=30692)     if torch.cuda.get_device_capability()[0] < 8:
(VllmWorkerProcess pid=30692)        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
(VllmWorkerProcess pid=30692)     prop = get_device_properties(device)
(VllmWorkerProcess pid=30692)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 444, in get_device_properties
(VllmWorkerProcess pid=30692)     _lazy_init()  # will define _get_device_properties
(VllmWorkerProcess pid=30692)     ^^^^^^^^^^^^
(VllmWorkerProcess pid=30692)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
(VllmWorkerProcess pid=30692)     torch._C._cuda_init()
(VllmWorkerProcess pid=30692) RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 2: out of memory
(VllmWorkerProcess pid=30694) Process VllmWorkerProcess:
(VllmWorkerProcess pid=30694) Traceback (most recent call last):
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(VllmWorkerProcess pid=30694)     self.run()
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/multiprocessing/process.py", line 108, in run
(VllmWorkerProcess pid=30694)     self._target(*self._args, **self._kwargs)
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/multiproc_worker_utils.py", line 210, in _run_worker_process
(VllmWorkerProcess pid=30694)     worker = worker_factory()
(VllmWorkerProcess pid=30694)              ^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 68, in _create_worker
(VllmWorkerProcess pid=30694)     wrapper.init_worker(**self._get_worker_kwargs(local_rank, rank,
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 334, in init_worker
(VllmWorkerProcess pid=30694)     self.worker = worker_class(*args, **kwargs)
(VllmWorkerProcess pid=30694)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/worker.py", line 85, in __init__
(VllmWorkerProcess pid=30694)     self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
(VllmWorkerProcess pid=30694)                                             ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 217, in __init__
(VllmWorkerProcess pid=30694)     self.attn_backend = get_attn_backend(
(VllmWorkerProcess pid=30694)                         ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 45, in get_attn_backend
(VllmWorkerProcess pid=30694)     backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
(VllmWorkerProcess pid=30694)               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py", line 151, in which_attn_to_use
(VllmWorkerProcess pid=30694)     if torch.cuda.get_device_capability()[0] < 8:
(VllmWorkerProcess pid=30694)        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
(VllmWorkerProcess pid=30694)     prop = get_device_properties(device)
(VllmWorkerProcess pid=30694)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 444, in get_device_properties
(VllmWorkerProcess pid=30694)     _lazy_init()  # will define _get_device_properties
(VllmWorkerProcess pid=30694)     ^^^^^^^^^^^^
(VllmWorkerProcess pid=30694)   File "/root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
(VllmWorkerProcess pid=30694)     torch._C._cuda_init()
(VllmWorkerProcess pid=30694) RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 2: out of memory
ERROR 07-12 08:16:26 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 30693 died, exit code: 1
INFO 07-12 08:16:26 multiproc_worker_utils.py:123] Killing local vLLM worker processes

1. 临时解决方法

vi /root/miniconda3/envs/vllm2025/lib/python3.10/site-packages/vllm/attention/selector.py--- 设置成固定的 `backend = _Backend.XFORMERS`。# backend = which_attn_to_use(num_heads, head_size, num_kv_heads,#                           sliding_window, dtype, kv_cache_dtype,#                            block_size)backend = _Backend.XFORMERS
---

完结!

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com