一、定义
1.定义
2. 案例
3. pycuda 调用c++,并在内核中执行
4. 接口
二、实现
- 定义
PyCUDA 是一个基于 NVIDIA CUDA 的 Python 库,用于在 GPU 上进行高性能计算。它提供了与 CUDA C 类似的接口,可以方便地利用 GPU 的并行计算能力进行科学计算、机器学习、深度学习等领域的计算任务。
官网教程:https://documen.tician.de/pycuda/
中文教程:https://www.osgeo.cn/pycuda/driver.html#pycuda.driver.register_host_memory
pip install pycuda -i https://mirror.baidu.com/pypi/simple
- 案例
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModuleimport numpy
a = numpy.random.randn(4,4)
a = a.astype(numpy.float32)a_gpu = cuda.mem_alloc(a.nbytes) #cuda 申请线性内存cuda.memcpy_htod(a_gpu, a) #将a 拷贝到cuda 中a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu) #从cuda 中拷贝出a_gpuprint (a_doubled)print (a)
3.pycuda 调用c++,并在内核中执行
import pycuda.autoinit
from pycuda.compiler import SourceModule
kernel_code = r"""
__global__ void hello_from_gpu(void)
{printf("Hello World from the GPU!\n");
}
"""
mod = SourceModule(kernel_code)
hello_from_gpu = mod.get_function("hello_from_gpu")
hello_from_gpu(block=(1,1,1))
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModuleimport numpy
a = numpy.random.randn(4,4)
a = a.astype(numpy.float32)a_gpu = cuda.mem_alloc(a.nbytes)cuda.memcpy_htod(a_gpu, a)mod = SourceModule("""__global__ void doublify(float *a){int idx = threadIdx.x + threadIdx.y*4;a[idx] *= 2;}""")func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)
4.接口
import numpy as np
import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModulea = gpuarray.to_gpu(np.random.rand(1,10).astype(np.float32))
b = gpuarray.to_gpu(np.random.rand(1,10).astype(np.float32))
c = gpuarray.maximum(a,b)
print(a,b,c)
gpu_ary = gpuarray.zeros((m,n),dtype=np.float32) # 开辟gpu内存空间,创建0矩阵
gpu_ary = gpuarray.empty((m,n),dtype=np.float32) # 开辟gpu内存空间,创建空矩阵
gpu_ary = gpuarray.zeros_like(ary) # 开辟gpu内存空间,创建一个类似于ary的0矩阵,因此ary最好# 也是np.float32类型
gpu_ary = gpuarray.empty_like(ary) # 开辟gpu内存空间,创建一个类似于ary的空矩阵,因此ary最好 # 也是np.float32类型
gpu_ary = gpuarray.arange(start,stop,step,dtype=None) #创建顺序序列,类型最好指定为np.float32
gpu_ary = gpuarray.take(a,ind) # 返回gpuArray[a[ind[0]],..., a[ind[n]]]