Depth is the distance of a atom/residue to its closest molecule of bulk solvent.
The protein molecule of interest is placed at the center of a pre-equilibrated box of solvent
(water). Full atomic water model SPC216 (generated using GROMACS [Hess et at, 2008] genbox with
spc216.gro structural file) is used here.
The water molecules that clash with atoms of the protein (within 2.6Å of
protein atoms) are removed from the box.
Other than the clashing water molecules, non-bulk waters are also removed from the box.
Non-bulk waters are those that are trapped in cavities (Figure 1) and isolated from the bulk solvent.
Isolated water are detected by inspecting the number of water molecules in its immediate neighborhood.
A water molecule is considered non-bulk if there are less than a specified
minimum number of neighborhood waters
(default value = 4) within a spherical volume of a specified solvent neighborhood radius(default
value 4.2Å, Figure 2).
The removal of a cavity water causes its immediately neighboring waters to lose one neighborhood
water molecule. For this reason, the check and removal of non-bulk waters is iterated until
there is no further removal of water from the solvent box.
For practical reasons, users are advised to vary the minimum number of
neighborhood waters in the range 1 - 5. Checking for larger number of neighborhood
waters often results in the removal of all water molecules.
The bulk water surrounding a protein is freely diffusing. To mimic this dynamics of bulk
water, the protein is solvated repeatedly, each time in a different orientation. New
orientations are generated by rotating the protein by a random angle about an axis passing
through its center of mass, and translating it along the X-axis to a random distance < 2.8 Å
(the average distance between neighboring water molecule in the box). Each solvation of the
protein is considered to represent a snapshot of the dynamics of bulk-water. With sufficient
number of solvations, water molecules can explore all regions accessible to bulk solvent, hence
mimicking bulk-water dynamics (Figure 3).
At each solvation iteration, the value of atom/residue depth is computed as the distance between
the atom/residue to the closest molecule of bulk water. Depth is finally reported as the average
depth over all solvation iterations. The user can specify the 'number
of solvation cycles '(default = 25).
Note: Run time scales linearly with number of solvation cycles.
A binding cavity is a protein sub-structure of conserved geometrical and chemical properties complimentary to its bound ligand. Using a training-set of ligand bound high resolution crystal structures of proteins, residue depth and solvent-accessible area values were computed for all residues. The probability of individual amino acids to form part of the binding cavity is parametrized by the residue depth, accessible area value pairs (Figure 4)
For an query protein, solvent accessibility and depth are computed for all residues. Residues are assigned binding cavity probability values corresponding to solvent accessibility, residue depth value pairs. (Figure 5). If evolutionary information is used in making the predictions, 3 iterations of PSI-BLAST is used to create a multiple sequence alignment of homologues of the query (e-value cut-off of 0.00001). From this multiple sequence alignment a entropy value is computed and the probability value of the residues are then an average of the Depth/ASA prediction probability values and the entropy probability values.
pKa is a measure of acidic strength and is defined as the logarithm of the acid dissociation constant. pKa of protein residues estimates the protonation strength of its ionizable groups. Ionizable residues play a significant role in several protein functions including folding, stability, solubility, protein-protein interactions etc. To gain insight into these function, it is often crucial to accurately determine the pKa of ionizable residues. The pKa values of ionizable groups are however highly sensitive to their enviroment. This sensitivity makes pKa estimation difficult. Here we introduce a simple method for pKa prediction. pKa predictions are made for the following ionizable residues: ASP, GLU, HIS and LYS residues using the formula: \begin{equation} \begin{array}{c} p K a_{\text {predicted }}=p K a_{\text {model }}+a_{1} * M C_{\text {depth }}+a_{2} * \text { polarSC }_{\text {depth }}+a_{3} *(\text { no.donors })+a_{4} *(\text { no. of acceptors }) \\ +a_{5} *(\text { elec })+a_{6} *\left(A S A_{\text {SC }}\right) \end{array} \end{equation}
where the predicted pKa is a correction to the model pKa. The model pKa for a particular amino acid residue is determined for the case when the titratable group is completely accessible to the solvent and minimally pertubed by the surrounding environment. The correction terms are in the form of a linear combination of 6 different features including main chain depth (MCdepth), polar side chain depth (polarSCdepth), number of H-bonds donors and acceptors (no. HbondsN and no. HbondsO), the electrostatic potential centred at the titratable group considering all partial charges within a cut-off distance of 8Å (elec) and the solvent accessible area of the side chain (ASASC) .
The server attempts to identify a small set of amino acid residues in a query protein that have a high probability of being buried (side-chain accessibilities less than 5%, expressed in terms of residue depth). The server suggests substitutions at these buried positions that are most likely to result in a temperature sensitive (Ts) phenotype.
The Ts phenotype has been shown to correlate with decreased protein stability and reduced levels of the protein, in vivo.
It has also been shown that substitutions of buried hydrophobic residues often result in significant destabilization of the protein, often much larger changes in protein stability than mutations at surface positions. Hence, our approach to predict substutions that result in a Temperature-sensitive mutant is to predict positions of hydrophobic residues in the protein that are likely to be buried.
Several cases of substitution of buried residue positions have been shown to result in a Ts phenotype,for example in the case of T4 lysozyme, and gene V protein.
The Rose Hydrophobicity scale is chosen to quantify the hydrophobicity of amino acid residues, as it most closely correlates with the degree of residue burial. In this study, the hydrophobicity values of the scale were chosen to be equal to the average extent of burial of the residue in the training set, i.e.
\[B_{i}=\left(A_{o i}-A_{i}\right) / A_{o i} \cdot 100 \%\]
7 types of amino acid residues with rescaled hydrophobicity greater than 80, namely Cys, Phe, Ile, Val, Trp, Met, and Leu are defined as hydrophobic residues in this study. As Cys could be involved in disulfide bonds or metal ion coordination, they are not included in the prediction candidates.
Amino Acid | Hydrophobicity | Amino Acid | Hydrophobicity | Amino Acid | Hydrophobicity | Amino Acid | Hydrophobicity |
---|---|---|---|---|---|---|---|
Cys | 100 | Met | 85 | Gly | 51 | Asn | 28 |
Phe | 92 | Leu | 85 | Thr | 46 | Gln | 26 |
Ile | 92 | His | 67 | Ser | 36 | Glu | 26 |
Val | 87 | Tyr | 62 | Arg | 31 | Asp | 26 |
Trp | 85 | Ala | 56 | Pro | 31 | Lys | 0 |
Burial is quantified using two parameters: 1) average hydrophobicity and 2) Hydrophobic moment.
Average Hydrophobicity of a residue (averaged over a seven residue window) is given by:
\[H_{a v}(j)=\sum_{n-j-3}^{j+3} H(n) / 7\]
where the H(n)s are the rescaled individual residue hydrophobicities listed above.
The hydrophobic moment, Hmom is calculated over a nine residue window as follows:
\[H_{m o m}(j)=\left\{\left[\sum_{n-j-4}^{j+4} H(n) \sin (\delta \cdot n)\right]^{2}+\left[\sum_{n-j-4}^{n+4} H(n) \cos (\delta \cdot n)\right]^{2}\right\}^{1 / 2}\]
where δ is the phase angle and is dependent on the periodicity of the secondary structure that the sequence is assumed to adopt:
Hydrophobic moment is introduced because both helices and β strands often have one solvent exposed hydrophilic face and one buried hydrophobic face. Buried regions of such sequences therefore cannot be identified using only Hav. In contrast, they can be indentified by average Hav and high Hmom values.
The following prediction rules were generated by large scale analysis of data ( Varadarajan et al, 1996).
Burial Prediction | Confidence Level(%) Prediction criteria |
---|---|
>=95 | Residue, as well as both flanking residues, are hydrophobic and Hav >= 75 |
>=90 |
|
>=80 | Residue is hydrophobic and any of the following conditions are met:
|
The free energies of unfolding of typical globular proteins are in the range of 5 - 15 kcal/mol at room temperature. A Ts mutation should destabilize the protein by an amount that is an appreciable fraction of the free energy of folding at the nonpermissive temperature.
The exact amount of destabilization produced by a mutation will depend on the effect of the mutation on ΔG, ΔH, and ΔCp. In general, these will not be known for the protein of interest. It is therefore desirable to make both conservative and nonconservative substitutions at predicted buried sites, so that at least one of these will result in a Ts phenotype.
Our approach is therefore to suggest five different substitutions at each predicted buried site that differ in the stereochemistry and polarity of the substituted residue. These substitutions span a wide range of free energy and, we assume that at least one of these substitutions would destabilize the protein to an extent appropriate for a Ts phenotype.
One effective set of stereochemically diverse set of residues was found to be {Ala, Trp, Gln, Asp, Pro}