Skip to main content
. 2010 Feb 3;5(2):e9044. doi: 10.1371/journal.pone.0009044

Table 2. Top ten mutations for each RC dataset according to weights derived from the initial linear SVR.

rank Monogram mutation influence Erlangen rank Erlangen mutation influence Monogram rank
1 RT M184V dec. 19 RT Q207E inc. 240
2 PR K43T dec. 568 PR V82A dec. 127
3 RT A158S dec. 126 RT Y181C inc. 150
4 PR Q92R dec. 401 RT T215Y dec. 18
5 PR I64L dec. 886 RT K20I inc. 49
6 PR K55R dec. 602 PR I13V dec. 132
7 PR E34K dec. 483 RT E122K inc.
8 PR I47V dec. 366 RT L74V inc. 141
9 PR V32I dec. 131 RT S162C inc. 255
10 PR P39S dec. 141 RT T39E dec. 267

Along with a mutation, its influence on RC compared to the wild-type is listed – “dec.” for “decreasing”, “inc.” for “increasing” – as well as its position in the feature ranking of the other dataset. With the exception of RT A158S, PR I64L, PR P39S, RT Q207E, RT E122K, RT S162C, and RT T39E, all of theses mutations are known to be associated with HIV drug resistance and/or fitness [20], [31]. In total, the two feature rankings consist of 878 mutations from the Monogram dataset and 1018 mutations from the Erlangen dataset; the difference is mainly due to the fact that fewer sequence positions are included in the Monogram genotypes. Note that the mutation RT E122K does not occur in the Monogram ranking. In the Monogram dataset, lysine (K) – not the wild-type glutamic acid (E) – is the consensus amino acid at position 122 of the RT sequence, so that E122K was removed from the training dataset in the input coding phase. The clear dominance of RC-decreasing mutations in the Monogram dataset may be partly due to the stronger bias towards low-RC samples in this dataset (median measured RC of 38.45%, compared to 46.47% in the Erlangen dataset; see also Figure 1).