踩坑系列恢复更新
安装deepspeed时,主服务器装机时没有装cuda的编译器等一系列工具,导致pip时无法编译deepspeed包以及一系列依赖
具体错误:
$pip3 install deepspeed -i "http://yum.tbsite.net/pypi/simple/" --trusted-host "yum.tbsite.net" Looking in indexes: http://yum.tbsite.net/pypi/simple/ Collecting deepspeed Downloading http://yum.tbsite.net/pypi/packages/06/b3/a3903de5c5b707170c5c27e1a40f4ef613f14d241bd84d8b151a2a8786f6/deepspeed-0.16.7.tar.gz (1.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 14.6 MB/s eta 0:00:00 Preparing metadata (setup.py) ... error error: subprocess-exited-with-error × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [8 lines of output] Traceback (most recent call last): File "<string>", line 2, in <module> File "<pip-setuptools-caller>", line 35, in <module> File "/tmp/pip-install-bqu9rp8j/deepspeed_a2abcc73fa3f4d49b5d3a3d2862d6342/setup.py", line 110, in <module> cuda_major_ver, cuda_minor_ver = installed_cuda_version() File "/tmp/pip-install-bqu9rp8j/deepspeed_a2abcc73fa3f4d49b5d3a3d2862d6342/op_builder/builder.py", line 51, in installed_cuda_version raise MissingCUDAException("CUDA_HOME does not exist, unable to compile CUDA op(s)") op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s) [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed × Encountered error while generating package metadata. ╰─> See above for output. note: This is an issue with the package mentioned above, not pip. hint: See above for details. (geocoding)
分析问题:
无nvcc这个cuda的编译工具,如何在不动root的全局环境下解决:
很简单,nvidia-smi查看cuda版本号,使用conda install安装一个cuda-toolkit即可解决
之后,可以正常pip deepspeed