cospisupport@iiserpune.ac.in

Benchmarks




Prediction Benchmark on experimentally validated cases

We benchmarked our method by examining the agreement between our predictions with experimentally validated Ts mutants. 36 mutants from a set of 6 proteins, for which extensive mutagenesis data exists, constituted our benchmark. The proteins were gene V (PDB:1YHA), lambda repressor (PDB:1LMB), T4 Lysozyme (PDB:2LZM), CcdB (PDB:3VUB), Gal4 (PDB:3CQQ) and Ura3 (PDB:1DQW) ). Results from both sequence based and structure based methods are reported in table 1.

Note that our methods also made 71 other predictions in these 6 proteins that are yet to be experimentally validated. They are listed in supplementary data (part B).


Prediction Benchmark using homology model of varying accuracies

For a query sequence without a PDB entry, structural information was inferred from a homology model. The accuracy of the model is largely determined by sequence identity to the structural template used. To gauge the effect of template sequence identity on Ts mutant prediction performance, we built models of T4 lysozyme using 15 templates of varying sequence similarity. T4 lysozyme (PDB:2LZM) was used for this benchmark as its structural templates of varying sequence identities can be found in PDB. The benchmark result is shown in table 2.



Table 1: Prediction Benchmark on experimentally validated cases

Protein

PDB ID

Chain length

Residue position

Residue type

Prediction method

gene V

1YHA

87

35

VAL

Both

 



45

VAL

Both

 



47

ILE

Structure

 



63

VAL

Structure

 



81

LEU

Structure

 

 

 

78

ILE

Sequence

lambda repressor

1LMB

92

51

PHE

Both

 



65

LEU

Both

 



76

PHE

Both

 



84

ILE

Both

 



18

LEU

Structure

 



36

VAL

Structure

 



47

VAL

Structure

T4 lysozyme

2LZM

164

6

MET

Both

 



102

MET

Both

 



149

VAL

Structure

 



153

PHE

Structure

 



103

VAL

Sequence

CcdB

3VUB

101

17

PHE

Both

 



18

VAL

Both

 



33

VAL

Both

 



34

ILE

Both

 



54

VAL

Both

 



5

VAL

Structure

 



36

LEU

Structure

 



63

MET

Structure

 



50

LEU

Sequence

 



53*

VAL

Sequence

 



96

LEU

Sequence

 



97

MET

Sequence

 

 

 

98

PHE

Sequence

Gal4

3COQ

88

68

PHE

Both

 



69

LEU

Sequence

 

 

 

70

LEU

Sequence

Ura3

1DQW

267

25

MET

Structure

 



32

LEU

Structure

 

 

 

118

ILE

Structure


Table 1: Ts mutant position as predicted by sequence based, structured based or both methods. The wild type residue (3 letter amino acid code) at the position is listed under residue type. * The prediction of VAL53 in CcdB as a Ts mutant position is a false positive identification.


Table 2: Prediction Benchmark using homology model of varying accuracies

Template quality

Number of predictions

Experimentally validated mutant positions

PDB:chain

Sequence Id (%)

DOPE

GA341

Sequence

Structure

Both

M6

M102

V103

V149

F153

1pqj:A

90.8

-2.08

1.00

3 (8)

4 (11)

2 (3)

Both

Both

Sequence

Structure

Structure

1d3n:A

86.1

-1.96

1.00

3 (8)

4 (11)

2 (3)

Both

Both

Sequence

Structure

Structure

1t8a:A

81.6

-1.65

1.00

3 (8)

4 (13)

2 (3)

Both

Both

Sequence

Structure

Structure

1cx6:A

79.9

-2.03

1.00

3 (8)

4 (11)

2 (3)

Both

Both

Sequence

Structure

Structure

1lpy:A

78.8

-1.95

1.00

3 (8)

4 (10)

2 (3)

Both

Both

Sequence

Structure

Structure

1swz:A

77.5

-2.21

1.00

3 (8)

4 (13)

2 (3)

Both

Both

Sequence

Structure

Structure

1lwk:A

77.0

-1.74

1.00

3 (8)

4 (12)

2 (3)

Both

Both

Sequence

Structure

Structure

1swy:A

74.5

-2.22

1.00

3 (8)

4 (12)

2 (3)

Both

Both

Sequence

Structure

Structure

1sx2:A

72.4

-2.28

1.00

3 (8)

4 (13)

2 (4)

Both

Both

Sequence

Structure

Structure

1wth:A

43.2

-1.49

1.00

3 (8)

3 (12)

1 (2)

Sequence

Both

Sequence

Structure

Structure

1k28:A

43.2

-1.43

1.00

3 (8)

5 (15)

3 (5)

Both

Both

Both

Structure

Structure

2anv:A

24.2

0.54

0.12

3 (8)

1 (11)

1 (2)

Sequence

Both

Sequence

--

--

2anx:B

23.9

0.60

0.08

3 (8)

2 (11)

2 (3)

Sequence

Both

Both

--

--

2anv:B

23.5

0.49

0.13

3 (8)

2 (11)

2 (3)

Sequence

Both

Both

--

--

2anx:A

22.1

0.64

0.13

3 (8)

1 (10)

1 (2)

Sequence

Both

Sequence

--


Table 2: Ts mutant prediction in T4 lysozyme when homology models (identified by their templates) of different accuracies are used. The number of predictions made by the sequence based, structure based or both methods are listed for each of the models with the number of experimentally validated predictions within brackets. The performance of the different models on the experimentally validated mutant positions are additionally shown in separate columns.