This seems rather obvious, but I decided to double check before building a machine learning model based on RNAfold’s and RNAcofold’s predictions involving sequences of varying length.
I generated 30,000 random RNA sequences of random length between 15 and 30 bases. I ran RNAfold on this list; and RNAcofold on this same list where the second sequence was the reverse compliment of the first. Here are the results for RNAfold:
And the results for RNAcofold:
Clearly the trends are linear for this sequence length range, so sequence length should be included in the model.