RuntimeError cuDNN error CUDNN_STATUS_EXECUTION_FAILED Solution
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILEDWhen you encountered the above issue and Google it, you will find lots of discussions. Unfortunately, very rare of them are useful and work.
Actually, the root cause is pytorch/cuda/python compatibility issue.
Solution
The solution is straightforward. Simply downgrade pytorch and install a different version of cuda or python would be fine.
My environment:
- Ubuntu 18.04 LTS
- Python 3.6.9
- PyTorch 1.3.0
- cuda 10.1
This command resolved my issue (PyTorch version really matters! from 1.3.0 to 1.2.0):
conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorchIf you use conda, you can refer to this page to find a working combination (downgrade might be better than upgrade...): INSTALLING PREVIOUS VERSIONS OF PYTORCH
Detailed Error Message
$ python seq2seq.py
Traceback (most recent call last):
  File "seq2seq.py", line 311, in <module>
    trainIters(encoder, attndecoder, 75000, print_every=5000)
  File "seq2seq.py", line 289, in trainIters
    decoder, encoder_optimizer, decoder_optimizer, criterion)
  File "seq2seq.py", line 233, in train
    encoder_output, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)
  File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "seq2seq.py", line 154, in forward
    output, hidden = self.gru(output, hidden)
  File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 729, in forward
    return self.forward_tensor(input, hx)
  File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 721, in forward_tensor
    output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
  File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 699, in forward_impl
    result = self.run_impl(input, hx, batch_sizes)
  File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 680, in run_impl
    self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILEDDO NOT Upgrade Ubuntu
Someone said newer Ubuntu fixed some low-level libraries. I upgraded Ubuntu from 14.04 to 16.04 and then to 18.04, nothing happened.