RuntimeError cuDNN error CUDNN_STATUS_EXECUTION_FAILED Solution
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
When you encountered the above issue and Google it, you will find lots of discussions. Unfortunately, very rare of them are useful and work.
Actually, the root cause is pytorch/cuda/python compatibility issue.
Solution
The solution is straightforward. Simply downgrade pytorch and install a different version of cuda or python would be fine.
My environment:
- Ubuntu 18.04 LTS
- Python 3.6.9
- PyTorch 1.3.0
- cuda 10.1
This command resolved my issue (PyTorch version really matters! from 1.3.0 to 1.2.0):
conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch
If you use conda, you can refer to this page to find a working combination (downgrade might be better than upgrade...): INSTALLING PREVIOUS VERSIONS OF PYTORCH
Detailed Error Message
$ python seq2seq.py
Traceback (most recent call last):
File "seq2seq.py", line 311, in <module>
trainIters(encoder, attndecoder, 75000, print_every=5000)
File "seq2seq.py", line 289, in trainIters
decoder, encoder_optimizer, decoder_optimizer, criterion)
File "seq2seq.py", line 233, in train
encoder_output, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)
File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "seq2seq.py", line 154, in forward
output, hidden = self.gru(output, hidden)
File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 729, in forward
return self.forward_tensor(input, hx)
File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 721, in forward_tensor
output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 699, in forward_impl
result = self.run_impl(input, hx, batch_sizes)
File "/home/user/miniconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 680, in run_impl
self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
DO NOT Upgrade Ubuntu
Someone said newer Ubuntu fixed some low-level libraries. I upgraded Ubuntu from 14.04 to 16.04 and then to 18.04, nothing happened.