RNAfold’s and RNAcofold’s predicted dG correlates with sequence length

This seems rather obvious, but I decided to double check before building a machine learning model based on RNAfold’s and RNAcofold’s predictions involving sequences of varying length.

Method

I generated 30,000 random RNA sequences of random length between 15 and 30 bases. I ran RNAfold on this list; and RNAcofold on this same list where the second sequence was the reverse compliment of the first. Here are the results for RNAfold:

And the results for RNAcofold:

Clearly the trends are linear for this sequence length range, so sequence length should be included in the model.

Leave a Reply

Your email address will not be published.