Depth is the distance of a atom/residue to its closest molecule of bulk solvent.
The protein molecule of interest is placed at the center of a pre-equilibrated box of solvent
(water). Full atomic water model SPC216 (generated using GROMACS [Hess et at, 2008] genbox with
spc216.gro structural file) is used here.
The water molecules that clash with atoms of the protein (within 2.6Å of protein atoms) are removed from the box.
Other than the clashing water molecules, non-bulk waters are also removed from the box.
Non-bulk waters are those that are trapped in cavities (Figure 1) and isolated from the bulk solvent.
Isolated water are detected by inspecting the number of water molecules in its immediate neighborhood.
A water molecule is considered non-bulk if there are less than a specified
minimum number of neighborhood waters
(default value = 4) within a spherical volume of a specified solvent neighborhood radius(default
value 4.2Å, Figure 2).
The removal of a cavity water causes its immediately neighboring waters to lose one neighborhood water molecule. For this reason, the check and removal of non-bulk waters is iterated until there is no further removal of water from the solvent box. For practical reasons, users are advised to vary the minimum number of neighborhood waters in the range 1 - 5. Checking for larger number of neighborhood waters often results in the removal of all water molecules.
The bulk water surrounding a protein is freely diffusing. To mimic this dynamics of bulk
water, the protein is solvated repeatedly, each time in a different orientation. New
orientations are generated by rotating the protein by a random angle about an axis passing
through its center of mass, and translating it along the X-axis to a random distance < 2.8 Å
(the average distance between neighboring water molecule in the box). Each solvation of the
protein is considered to represent a snapshot of the dynamics of bulk-water. With sufficient
number of solvations, water molecules can explore all regions accessible to bulk solvent, hence
mimicking bulk-water dynamics (Figure 3).
At each solvation iteration, the value of atom/residue depth is computed as the distance between the atom/residue to the closest molecule of bulk water. Depth is finally reported as the average depth over all solvation iterations. The user can specify the 'number of solvation cycles '(default = 25).
Note: Run time scales linearly with number of solvation cycles.
A binding cavity is a protein sub-structure of conserved geometrical and chemical properties complimentary to its bound ligand. Using a training-set of ligand bound high resolution crystal structures of proteins, residue depth and solvent-accessible area values were computed for all residues. The probability of individual amino acids to form part of the binding cavity is parametrized by the residue depth, accessible area value pairs (Figure 4)
For an query protein, solvent accessibility and depth are computed for all residues. Residues are assigned binding cavity probability values corresponding to solvent accessibility, residue depth value pairs. (Figure 5). If evolutionary information is used in making the predictions, 3 iterations of PSI-BLAST is used to create a multiple sequence alignment of homologues of the query (e-value cut-off of 0.00001). From this multiple sequence alignment a entropy value is computed and the probability value of the residues are then an average of the Depth/ASA prediction probability values and the entropy probability values.