Prediction Benchmark on experimentally validated cases
We benchmarked our method by examining the agreement between our predictions with experimentally validated Ts mutants. 36 mutants from a set of 6 proteins, for which extensive mutagenesis data exists, constituted our benchmark. The proteins were gene V (PDB:1YHA), lambda repressor (PDB:1LMB), T4 Lysozyme (PDB:2LZM), CcdB (PDB:3VUB), Gal4 (PDB:3CQQ) and Ura3 (PDB:1DQW) ). Results from both sequence based and structure based methods are reported in table 1.
Note that our methods also made 71 other predictions in these 6 proteins that are yet to be experimentally validated. They are listed in supplementary data (part B).
Prediction Benchmark using homology model of varying accuracies
For a query sequence without a PDB entry, structural information was inferred from a homology model. The accuracy of the model is largely determined by sequence identity to the structural template used. To gauge the effect of template sequence identity on Ts mutant prediction performance, we built models of T4 lysozyme using 15 templates of varying sequence similarity. T4 lysozyme (PDB:2LZM) was used for this benchmark as its structural templates of varying sequence identities can be found in PDB. The benchmark result is shown in table 2.
Protein |
PDB ID |
Chain length |
Residue position |
Residue type |
Prediction method |
---|---|---|---|---|---|
gene V |
1YHA |
87 |
35 |
VAL |
Both |
|
|
|
45 |
VAL |
Both |
|
|
|
47 |
ILE |
Structure |
|
|
|
63 |
VAL |
Structure |
|
|
|
81 |
LEU |
Structure |
|
|
|
78 |
ILE |
Sequence |
lambda repressor |
1LMB |
92 |
51 |
PHE |
Both |
|
|
|
65 |
LEU |
Both |
|
|
|
76 |
PHE |
Both |
|
|
|
84 |
ILE |
Both |
|
|
|
18 |
LEU |
Structure |
|
|
|
36 |
VAL |
Structure |
|
|
|
47 |
VAL |
Structure |
T4 lysozyme |
2LZM |
164 |
6 |
MET |
Both |
|
|
|
102 |
MET |
Both |
|
|
|
149 |
VAL |
Structure |
|
|
|
153 |
PHE |
Structure |
|
|
|
103 |
VAL |
Sequence |
CcdB |
3VUB |
101 |
17 |
PHE |
Both |
|
|
|
18 |
VAL |
Both |
|
|
|
33 |
VAL |
Both |
|
|
|
34 |
ILE |
Both |
|
|
|
54 |
VAL |
Both |
|
|
|
5 |
VAL |
Structure |
|
|
|
36 |
LEU |
Structure |
|
|
|
63 |
MET |
Structure |
|
|
|
50 |
LEU |
Sequence |
|
|
|
53* |
VAL |
Sequence |
|
|
|
96 |
LEU |
Sequence |
|
|
|
97 |
MET |
Sequence |
|
|
|
98 |
PHE |
Sequence |
Gal4 |
3COQ |
88 |
68 |
PHE |
Both |
|
|
|
69 |
LEU |
Sequence |
|
|
|
70 |
LEU |
Sequence |
Ura3 |
1DQW |
267 |
25 |
MET |
Structure |
|
|
|
32 |
LEU |
Structure |
|
|
|
118 |
ILE |
Structure |
Table 1: Ts mutant position as predicted by sequence based, structured based or both methods. The wild type residue (3 letter amino acid code) at the position is listed under residue type. * The prediction of VAL53 in CcdB as a Ts mutant position is a false positive identification.
Template quality |
Number of predictions |
Experimentally validated mutant positions |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
PDB:chain |
Sequence Id (%) |
DOPE |
GA341 |
Sequence |
Structure |
Both |
M6 |
M102 |
V103 |
V149 |
F153 |
|
1pqj:A |
90.8 |
-2.08 |
1.00 |
3 (8) |
4 (11) |
2 (3) |
Both |
Both |
Sequence |
Structure |
Structure |
|
1d3n:A |
86.1 |
-1.96 |
1.00 |
3 (8) |
4 (11) |
2 (3) |
Both |
Both |
Sequence |
Structure |
Structure |
|
1t8a:A |
81.6 |
-1.65 |
1.00 |
3 (8) |
4 (13) |
2 (3) |
Both |
Both |
Sequence |
Structure |
Structure |
|
1cx6:A |
79.9 |
-2.03 |
1.00 |
3 (8) |
4 (11) |
2 (3) |
Both |
Both |
Sequence |
Structure |
Structure |
|
1lpy:A |
78.8 |
-1.95 |
1.00 |
3 (8) |
4 (10) |
2 (3) |
Both |
Both |
Sequence |
Structure |
Structure |
|
1swz:A |
77.5 |
-2.21 |
1.00 |
3 (8) |
4 (13) |
2 (3) |
Both |
Both |
Sequence |
Structure |
Structure |
|
1lwk:A |
77.0 |
-1.74 |
1.00 |
3 (8) |
4 (12) |
2 (3) |
Both |
Both |
Sequence |
Structure |
Structure |
|
1swy:A |
74.5 |
-2.22 |
1.00 |
3 (8) |
4 (12) |
2 (3) |
Both |
Both |
Sequence |
Structure |
Structure |
|
1sx2:A |
72.4 |
-2.28 |
1.00 |
3 (8) |
4 (13) |
2 (4) |
Both |
Both |
Sequence |
Structure |
Structure |
|
1wth:A |
43.2 |
-1.49 |
1.00 |
3 (8) |
3 (12) |
1 (2) |
Sequence |
Both |
Sequence |
Structure |
Structure |
|
1k28:A |
43.2 |
-1.43 |
1.00 |
3 (8) |
5 (15) |
3 (5) |
Both |
Both |
Both |
Structure |
Structure |
|
2anv:A |
24.2 |
0.54 |
0.12 |
3 (8) |
1 (11) |
1 (2) |
Sequence |
Both |
Sequence |
-- |
-- |
|
2anx:B |
23.9 |
0.60 |
0.08 |
3 (8) |
2 (11) |
2 (3) |
Sequence |
Both |
Both |
-- |
-- |
|
2anv:B |
23.5 |
0.49 |
0.13 |
3 (8) |
2 (11) |
2 (3) |
Sequence |
Both |
Both |
-- |
-- |
|
2anx:A |
22.1 |
0.64 |
0.13 |
3 (8) |
1 (10) |
1 (2) |
Sequence |
Both |
Sequence |
-- |
|
Table 2: Ts mutant prediction in T4 lysozyme when homology models (identified by their templates) of different accuracies are used. The number of predictions made by the sequence based, structure based or both methods are listed for each of the models with the number of experimentally validated predictions within brackets. The performance of the different models on the experimentally validated mutant positions are additionally shown in separate columns.