執行import tensorflow時出現錯誤
2024-11-05 13:43:37.587423: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2024-11-05 13:43:37.587470: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
錯誤訊息說找不到 /dev/nvidia0,先下指令看看實際位置在哪
ls -l /dev/ |grep nvidia
發現位置在nvidia3

而且下指令找得到A100
nvidia-smi

原本實在不知道該怎麼辦,所以直接建立軟連結ln -s /dev/nvidia3 /dev/nvidia0,我知道是個爛方法,果然其他錯誤。
2024-11-05 13:53:53.975542: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2024-11-05 13:53:53.975594: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: runtime-service-hao-pu2-7b657df56d-zg8ht
2024-11-05 13:53:53.975601: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: runtime-service-hao-pu2-7b657df56d-zg8ht
2024-11-05 13:53:53.975691: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: NOT_FOUND: was unable to find libcuda.so DSO loaded into this program
2024-11-05 13:53:53.975721: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 535.161.8