Mar 31, 2021
Glad this was of help! And yes, agree that adding more layering like dropout or normalization can help improve the results.
Also, for the return_sequence, I have observed the length of sequence also matters. So we have different behaviour of architectural preference if the lstm “length" is small vs large. But I guess the major factor is the data, as always ☺
Cheers.