RuntimeError cuDNN error CUDNN_STATUS_EXECUTION_FAILED Solution

发表于 2020-01-03 更新于 2022-08-18 分类于 Machine Learning 评论：阅读次数：

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

When you encountered the above issue and Google it, you will find lots of discussions. Unfortunately, very rare of them are useful and work.

Actually, the root cause is pytorch/cuda/python compatibility issue.

Solution

The solution is straightforward. Simply downgrade pytorch and install a different version of cuda or python would be fine.

My environment:

Ubuntu 18.04 LTS
Python 3.6.9
PyTorch 1.3.0
cuda 10.1

This command resolved my issue (PyTorch version really matters! from 1.3.0 to 1.2.0):

conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch

If you use conda, you can refer to this page to find a working combination (downgrade might be better than upgrade...): INSTALLING PREVIOUS VERSIONS OF PYTORCH

Detailed Error Message

$ python seq2seq.py
Traceback (most recent call last):
  File "seq2seq.py", line 311, in <module>
    trainIters(encoder, attndecoder, 75000, print_every=5000)
  File "seq2seq.py", line 289, in trainIters
    decoder, encoder_optimizer, decoder_optimizer, criterion)
  File "seq2seq.py", line 233, in train
    encoder_output, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)
  File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "seq2seq.py", line 154, in forward
    output, hidden = self.gru(output, hidden)
  File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 729, in forward
    return self.forward_tensor(input, hx)
  File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 721, in forward_tensor
    output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
  File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 699, in forward_impl
    result = self.run_impl(input, hx, batch_sizes)
  File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 680, in run_impl
    self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

DO NOT Upgrade Ubuntu

Someone said newer Ubuntu fixed some low-level libraries. I upgraded Ubuntu from 14.04 to 16.04 and then to 18.04, nothing happened.