1. 报错信息:
block: [0,0,0], thread: [0,0,0] Assertion ^idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed
block: [0,0,0], thread: [0,0,0] Assertion ^idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed
block: [0,0,0], thread: [0,0,0] Assertion ^idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failedRuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
解决方法,主要看图片:
2. 报错信息
/pytorch/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [0,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [1,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:115: operator(): block: [0,0,0], thread: [2,0,0] Assertion `input_val >= zero && input_val <= one` failed.RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
解决方法,主要看图片:
这个报错主要有以下几种情况,排查一下自己的代码有无下面情况。
-
出现nan值。
-
预测值与标签长度不一致。
-
预测值或标签不在[0,1]范围内。
经过打印变量值,我发现我的预测值出现了nan的情况,但是跑其它的数据好好的。于是我便怀疑是数据出了问题。便去排查了一遍数据,最终发现了有些数据不正常。