Linux/C++ 部署onnx

1.Pytorch模型转onnx

pip直装

这一步最好把所有的操作都封装在onnx模型中，包括数据预处理、后处理，C++下不一定有Pytorch函数的平替

2.Ubuntu 22.04安装CUDA+cuDNN

这一步做一个版本对齐工作，要实现将 PyTorch

模型转换为 ONNX 并在 C++ 环境中推理

我的PyTorch版本是anconda：2.4.1+cu121

ubuntu的CUDA版本是10.1，并且未安装cuDNN

$ nvcc --version
# Cuda compilation tools, release 10.1, V10.1.243

参考文章：Ubuntu 22.04安装CUDA+cuDNN_ubuntu22.04 安装cuda-CSDN博客

3.Linux/C++ 部署onnx

参考文章：在 Python 和 C++ 环境下安装和使用 ONNX Runtime_onnxruntime c++安装-CSDN博客

（1）首先git项目

# 克隆 ONNX Runtime 仓库
git clone --recursive https://github.com/microsoft/onnxruntime
cd onnxruntime

这里必须使用git命令，否则报错fatal: not a git repository (or any of the parent directories)：ONNX Runtime 的构建脚本依赖于 Git 仓库和子模块同步，无法在非 Git 仓库中工作。

另外，git不下来，有可能是DNS 解析问题，ping命令显示github.com是我本地地址

$ ping github.com
# PING github.com (127.0.0.1) 56(84) bytes of data.
# 64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.422 ms

手动设置公共 DNS：

sudo nano /etc/resolv.conf
# 替换内容为以下内容：
nameserver 8.8.8.8

（2）安装

这步时间很长

# 构建并安装
./build.sh --config Release --build_shared_lib

安装过程中报错，CMake 3.26 or higher is required. You are running version 3.16.3

CMake版本低，建议手动下载和安装最新版本的 CMake：

https://github.com/Kitware/CMake/releases/

我选的是cmake-3.31.2-linux-x86_64.sh，安装：

sudo bash cmake-3.31.2-linux-x86_64.sh --skip-license --prefix=/usr/local
cmake --version # 验证新版本

如果仍然显示是旧版本的cmake，可能需要重新添加环境，命令如下：

ls /usr/local/bin/cmake  # 查看新安装的 CMake 二进制文件是否正确存在
# /usr/local/bin/cmake
export PATH=/usr/local/bin:$PATH  # 添加到 PATH
cmake --version

（3）设置环境变量和构建

export PATH=/usr/local/cuda-12.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH
./build.sh --config Release --use_cuda

（4）验证构建成果

构建完成后，输出文件通常位于以下路径

cd onnxruntime/build/Linux/Release

onnxruntime是根目录，我在其同一目录下创建测试文件onnx.cpp，测试代码如下：

新版本中GetInputName和GetOutputName被弃用了，使用如下新的函数，.onnx模型自己下载一个，或者onnxruntime也有很多自带的

#include <onnxruntime/core/session/onnxruntime_cxx_api.h>
#include <vector>
#include <iostream>int main() {// 初始化 ONNX RuntimeOrt::Env env(ORT_LOGGING_LEVEL_WARNING, "example");// 创建 SessionOptionsOrt::SessionOptions session_options;session_options.SetIntraOpNumThreads(1);session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_BASIC);// 存储输入和输出名称的容器std::vector<std::string> input_names;std::vector<std::string> output_names;// 加载模型Ort::Session session(env, "full_model.onnx", session_options);// 获取输入节点的数量size_t num_input_nodes = session.GetInputCount();std::cout << "Number of inputs: " << num_input_nodes << std::endl;// 创建分配器Ort::AllocatorWithDefaultOptions allocator;// 获取每个输入的信息for (size_t i = 0; i < num_input_nodes; ++i) {// 使用 GetInputNameAllocated 获取输入名称Ort::AllocatedStringPtr input_name_ptr = session.GetInputNameAllocated(i, allocator);input_names.push_back(input_name_ptr.get());// 获取输入的类型信息Ort::TypeInfo type_info = session.GetInputTypeInfo(i);auto tensor_info = type_info.GetTensorTypeAndShapeInfo();// 获取数据类型ONNXTensorElementDataType type = tensor_info.GetElementType();// 获取维度std::vector<int64_t> input_dims = tensor_info.GetShape();// 打印输入信息std::cout << "Input " << i << ": " << input_names[i] << std::endl;std::cout << "  Data Type: " << type << std::endl;std::cout << "  Dimensions: ";for (size_t j = 0; j < input_dims.size(); ++j) {std::cout << input_dims[j];if (j < input_dims.size() - 1) std::cout << "x";}std::cout << std::endl;}// 获取输出节点的数量size_t num_output_nodes = session.GetOutputCount();std::cout << "Number of outputs: " << num_output_nodes << std::endl;// 获取每个输出的信息for (size_t i = 0; i < num_output_nodes; ++i) {// 使用 GetOutputNameAllocated 获取输出名称Ort::AllocatedStringPtr output_name_ptr = session.GetOutputNameAllocated(i, allocator);output_names.push_back(output_name_ptr.get());// 获取输出的类型信息Ort::TypeInfo type_info = session.GetOutputTypeInfo(i);auto tensor_info = type_info.GetTensorTypeAndShapeInfo();// 获取数据类型ONNXTensorElementDataType type = tensor_info.GetElementType();// 获取维度std::vector<int64_t> output_dims = tensor_info.GetShape();// 打印输出信息std::cout << "Output " << i << ": " << output_names[i] << std::endl;std::cout << "  Data Type: " << type << std::endl;std::cout << "  Dimensions: ";for (size_t j = 0; j < output_dims.size(); ++j) {std::cout << output_dims[j];if (j < output_dims.size() - 1) std::cout << "x";}std::cout << std::endl;}}

编译，放置目录不同，相应修改路径

g++ -std=c++17 -I onnxruntime/include -L onnxruntime/build/Linux/Release -L/usr/local/cuda/lib64 onnx.cpp -o test_program -lonnxruntime -lcudart -lcublas

g++：调用 GCC 编译器的 C++ 版本。

-std=c++17：指定使用 C++17 标准进行编译。

-I：用于包含自定义的头文件路径。

-L：指定链接库路径。

-lonnxruntime：指定链接lonnxruntime.so动态库。

-lcudart：指定链接libcudart.so，这是 CUDA 的运行时库。

-lcublas：指定链接 libcublas.so，这是 CUDA 的线性代数库。

运行

./test_program

4.Linux/C++ onnx推理

我实现的是CLIP的onnx推理

遇到的问题汇总如下：

1 ./onnx1: error while loading shared libraries: libonnxruntime.so.1: cannot open shared object file: No such file or directory

可能是路径没有添加到共享库路径

sudo find / -name "libonnxruntime.so.1"
export LD_LIBRARY_PATH=/path/to/library:$LD_LIBRARY_PATH

2 TypeError: expected np.ndarray (got numpy.ndarray)

发现是numpy版本冲突，然后就开始删sudo pip uninstall numpy，

发现自己有多个版本的numpy

Found existing installation: numpy 1.20.0

Found existing installation: numpy 1.23.2

Found existing installation: numpy 1.26.4

一直删都不能删为止，Found existing installation: numpy 1.17.1这个系统版本删不掉，然后使用conda命令安装，否则最好指定pip安装目录，使用which -a pip命令查看

3 float16转float32

pytorch模型转onnx时，其中一个输出在pytorch中的类型是float16，c++中输出有问题，需要一步格式转化，可以使用onnxruntime自带的方法，但是链接库总是报错

#include <onnxruntime/core/providers/cpu/math/float16.h>// 获取 Quality Prediction 数据
auto* quality_prediction_data = output_tensors[2].GetTensorMutableData<uint16_t>();// 转换 float16 数据到 float32
onnxruntime::math::half quality_as_half(quality_prediction_data[0]);
quality_prediction_output = static_cast<float>(quality_as_half);

索性用转化函数先实现

float Float16ToFloat(uint16_t value) {uint32_t t1 = value & 0x7fff;         // 取低 15 位uint32_t t2 = value & 0x8000;         // 取符号位uint32_t t3 = value & 0x7c00;         // 取指数部分t1 <<= 13;                            // 左移 13 位，将尾数对齐到 float32 的位置t2 <<= 16;                            // 左移 16 位，将符号位对齐到 float32 的位置t1 += 0x38000000;                     // 加上 float32 的指数偏移量t1 = (t3 == 0 ? 0 : t1);              // 如果 float16 的指数为 0，float32 结果也为 0t1 |= t2;                             // 合并符号位float result;std::memcpy(&result, &t1, sizeof(result));  // 将结果转为 floatreturn result;
}

在之前验证.onnx模型时，如果输出数据类型为 10，说明是 float16。如果输出数据类型为 1，说明是 float32。

4 避免在追踪过程中将数据转换为Python原生类型。这会导致追踪过程可能无法正确记录数据流。在ONNX中，数据流应该是静态的，以便能够构建一个计算图。列表操作要改为使用Tensor()操作，ONNX模型的输入输出也必须都是Tensor()

5 Error during ONNX export: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!