SERVER FOR COMPUTING/PREDICTING DEPTH, CAVITY SIZES, LIGAND BNDING SITES AND pKa


Introduction

Compute/Predict

Benchmarks

Help

Statistics

Download

References

Contacts
DEPTH Algorithm
  1. Solvating the Protein Molecule
  2. Removal of non-bulk waters
  3. Sampling Solvent Configurations
Binding Cavity Prediction
  1. Constructing Binding Cavity Probability Tables for Amino-Acids
  2. Assigning Probability Values onto Protein of Interest
  3. Grouping Cavity Residues
pKa Prediction

Input and Output


DEPTH Algorithm

Depth is the distance of a atom/residue to its closest molecule of bulk solvent.


STEP 1: Solvating the Protein Molecule

The protein molecule of interest is placed at the center of a pre-equilibrated box of solvent (water). Full atomic water model SPC216 (generated using GROMACS [Hess et at, 2008] genbox with spc216.gro structural file) is used here.
Water molecules that clash with atoms of the protein, i.e., are within 2.6Å of protein atoms, are removed from the box.
[back to top]

STEP 2: Removal of non-bulk waters

Other than the clashing water molecules, non-bulk waters are also removed from the box.
Non-bulk waters are those that are trapped in cavities (Figure 1) and isolated from the bulk solvent.
Isolated water are detected by inspecting the number of water molecules in its immediate neighborhood.
A water molecule is considered non-bulk if there are less than a specified minimum number of neighborhood waters (default value = 4) within a spherical volume of a specified solvent neighborhood radius(default value 4.2 Å, Figure 2).
The removal of a cavity water causes its immediately neighboring waters to lose one neighborhood water molecule. For this reason, the check and removal of non-bulk waters is iterated until there is no further removal of water from the solvent box. For practical reasons, users are advised to vary the minimum number of neighborhood waters in the range 1 - 5. Checking for larger number of neighborhood waters often results in the removal of all water molecules.
Figure 1   Figure 2
 
[back to top]

STEP 3: Sampling Solvent Configurations

The bulk water surrounding a protein is freely diffusing. To mimic this dynamics of bulk water, the protein is solvated repeatedly, each time in a different orientation. New orientations are generated by rotating the protein by a random angle about an axis passing through its center of mass, and translating it along the X-axis to a random distance < 2.8 Å (the average distance between neighboring water molecule in the box). Each solvation of the protein is considered to represent a snapshot of the dynamics of bulk-water. With sufficient number of solvations, water molecules can explore all regions accessible to bulk solvent, hence mimicking bulk-water dynamics (Figure 3).
At each solvation iteration, the value of atom/residue depth is computed as the distance between the atom/residue to the closest molecule of bulk water. Depth is finally reported as the average depth over all solvation iterations. The user can specify the 'number of solvation cycles '(default = 25).
Note: Run time scales linearly with number of solvation cycles.

Figure 3: Mimicking solvent dynamics by repeated solvation
Protein molecules from 10 solvation cycles are superposed. The first hydration shell at each solvation is shown consecutively.
[back to top]


Binding Cavity Prediction

STEP 1: Constructing Binding Cavity Probability Tables for Amino-Acids

A binding cavity is a protein sub-structure of conserved geometrical and chemical properties complimentary to its bound ligand. Using a training-set of ligand bound high resolution crystal structures of proteins, residue depth and solvent-accessible area values were computed for all residues. The probability of individual amino acids to form part of the binding cavity is parametrized by the residue depth, accessible area value pairs (Figure 4).

Figure 4: Probability of HIS to form part of ligand binding cavity
Similar probabilities were estimated for all amino acids
and when the minimum number of neighborhood waters varied in the range 2-5.
[back to top]

STEP 2: Assigning Probability Values onto Protein of Interest

For an query protein, solvent accessibility and depth are computed for all residues. Residues are assigned binding cavity probability values corresponding to solvent accessibility, residue depth value pairs. (Figure 5). If evolutionary information is used in making the predictions, 3 iterations of PSI-BLAST is used to create a multiple sequence alignment of homologues of the query (e-value cut-off of 0.00001). From this multiple sequence alignment a entropy value is computed and the probability value of the residues are then an average of the Depth/ASA prediction probablity values and the entropy probability values.

Figure 5: Assigning probability value by
solvent accessibility, residue depth value pairs
PDB:101M Sperm Whale Myoglobin bound with ligand heme
[back to top]

STEP 3: Grouping Cavity Residues

  1. All residues with probability values above a user definable cavity prediction probability threshold are selected as binding cavity residues.
  2. A 6.2 Å sphere is built around each of these selected residue.
  3. Starting from the residue with highest probability value, all other selected residues within this sphere are merged into the same binding cavity.
  4. The process is repeated until no further merger occurs.
  5. Finally, a 3.6 Å sphere is built around every residue within each cavity.
  6. All solvent-accessible residue (side-chain accessibility > 30%) that are a part of these spheres are also grouped into the binding cavities.


Input and Output

Input Format

Users can specify the 4-letter code of an existing protein structure in the PDB followed by selected chain identifiers
Example: 2FP7 (select whole structure) or 2FP7AB (select chain A and chain B)

Note: The chain identifier record is case sensitive.
Users also have the option to upload a file in PDB format.
In its current implementation, the server only processes one input structure per submission. For larger scale applications, the program can be downloaded and run locally.
Batch submissions will soon be supported.
[back to top]

Depth Computation

Options
Users have the option to select the following categories for depth calculation - residue-wise depth, all atoms (atomic) depth, Main Chain atoms, Side Chain atoms, Polar Side Chain atoms and Non-Polar Side Chain atoms (i.e. carbon atoms).
Output
For each category of atom user selected, a residue-wise plot of depth value is displayed. The plot shows both mean and standard deviation of depth values. The depth output is also available for download in tab-delimited and PDB formats. In the PDB format depth values are recorded in the b-factor column.
When Residue-wise depth is chosen as one of the options, a Jmol [http://www.jmol.org/] 3D rendering of the structure accomponies the output. The structure is rainbow-colored with the exposed residues colored red and the buried residues blue.
[back to top]

Protein Binding Cavity Prediction

Using an extensive benckmark of 900 ligand bound proteins, we have established that some residues in most ligand binding cavities are simultaneously surface exposed and deep.
Options
The server predicts small molecule binding cavities on proteins. Prediction results may vary with choice of depth computation parameters. For example, using a larger value of minimum number of neighborhood waters results in the detection of larger cavities, which maybe apt for larger ligands.
The algorithm estimates the probability value of forming part of a binding cavity for every residue of the protein. Users have the option to alter the recommended cavity prediction probability threshold.
Binding cavity prediction for other biomolecule other than proteins (DNA, RNA etc) is currently not supported. (Please contact author for this application)
Output
The algorithm estimates the probability value of forming part of a binding cavity for every residue of the protein. A residue-wise probability plot is displayed. A list of residues that have probabilities greater than the threshold are displayed along with the list of residues predicted to form the binding cavities. Users can also choose to download the results in PDB (with b-factor column replaced by probability value) or tab-delimited formats.
A 3D rendition of the cavity prediction is shown using Jmol. Residues of the predicted binding cavity are colored red while the rest of the protein is colored blue.
[back to top]

Solvent Accessible Surface Area

Normalized residue-wise solvent accessible surface area of protein residues are computed using the Shrake-Rupley algorithm [Shrake and Rupley, 1973]. The output is downloadable in tab-delimited format. The surface area is calculated for 5 categories of atoms including - all atoms, Main Chain, Side Chain, Polar Side Chain and Non-Polar Side Chain (i.e. carbon atoms).
[back to top]

Default Values

For multi domain protein, the recommended values are
  • Minimum number of neighbourhood water = 4
  • Cavity prediction probability threshold = 0.50
For single domain protein, the recommended values are
  • Minimum number of neighbourhood water = 5
  • Cavity prediction probability threshold = 0.45
[back to top]



Utility of DEPTH

  • provides wider dynamic range for residue burial (as compared to accessible surface area)
  • correlates well with protein stability
  • correlates well with amide proton hydrogen exchange rates
  • predicts protein-protein interaction hot spots
  • helps explain evolutionary variability in protein sequences
[back to top]

predicting pKa of polar residues

pKa is a measure of acidic strength and is defined as the logarithm of the acid dissociation constant. pKa of protein residues estimates the protonation strength of its ionizable groups. Ionizable residues play a significant role in several protein functions including folding, stability, solubility, protein-protein interactions etc. To gain insight into these function, it is often crucial to accurately determine the pKa of ionizable residues. The pKa values of ionizable groups are however highly sensitive to their enviroment. This sensitivity makes pKa estimation difficult. Here we introduce a simple method for pKa prediction. pKa predictions are made for the following ionizable residues: ASP, GLU, HIS and LYS residues using the formula:

where the predicted pKa is a correction to the model pKa. The model pKa for a particular amino acid residue is determined for the case when the titratable group is completely accessible to the solvent and minimally pertubed by the surrounding environment. The correction terms are in the form of a linear combination of 6 different features including main chain depth (MCdepth), polar side chain depth (polarSCdepth), number of H-bonds donors and acceptors (no. HbondsN and no. HbondsO), the electrostatic potential centred at the titratable group considering all partial charges within a cut-off distance of 8Å (elec) and the solvent accessible area of the side chain (ASASC) .

[back to top]


Output

The pKa output is represented graphically (see below) on the results page or can be downloaded in a tab separated flat file. The results for each of GLU, ASP, LYS and HIS are shown in different histograms. A horizontal line on each of the plots indicates the pka value of the model residue.

[back to top]

MC Cavity Algorithm

Protein Cavities are empty spaces and voids in the interior of a protein structure. When connected to the surface, these cavities are referred to as Pockets. In general, folded proteins are tightly packed, however the packing efficiency varies not only between different proteins but also between different regions within the same protein (1). Large protein cavities correspond to regions of low packing density. Large to small amino acid, cavity creating, mutations in proteins lead to a decrease in stability which is directly correlated with the size of the cavity (2). Alternatively, cavity filling mutations which improve the internal packing of the protein, also improve the stability (2, 3). Surface pockets and grooves, on the other hand, are potential binding sites for drugs, ligands and other proteins. Detecting the location of cavities and pockets and accurately calculating their volumes becomes important in the study of protein structure, stability and protein design.

MC_Cavity Algorithm

Step 1: Solvating the Protein Molecule

The protein molecule of interest is placed at the center of a pre-equilibrated box of solvent (water). The water model SPC216 is used for solvating the protein. Water molecules that clash sterically with atoms of the protein, i.e. those water molecules that are within 2.6 Å of protein atoms, are removed from the box.

Step 2: Defining Bulk and Cavity Waters

Non-bulk waters are those waters that are trapped in cavities and isolated from the bulk solvent (Figure 1). Such a water molecule is detected by inspecting the number of other water molecules in its immediate neighborhood. If there are less than 3 (default value) neighboring water molecules within a spherical volume of radius 4.2 Å (default value), the water is considered non-bulk (Figure 2).

Step 3: Sampling Solvent Configurations

The bulk water surrounding a protein is freely diffusing. To mimic the dynamics of bulk water, a Monte Carlo Procedure is used wherein the protein is solvated repeatedly, each time in a different orientation. New orientations are generated by rotating the protein by a random angle about an axis passing through its center of mass, and translating it along the X-axis to a random distance < 2.8 Å (the average distance between neighboring water molecule in the box). With a sufficient number of iterations, water molecules can explore all regions accessible to bulk solvent, hence mimicking bulk-water dynamics (Figure 3). The default number of iterations is set to 10.

Step 4: Clustering Cavity Waters and Calculating Volumes

To cluster cavity waters, the solvated protein molecule, i.e. protein along with the bulk and cavity waters, from each iteration, are first superimposed with each other such that protein coordinates for each iteration are identical. Our in-house software CLICK (4) is used for superimposition using default parameters. Next, cavity waters are clustered using a distance based criterion. Cavity waters within 1.2 Å of each other are clustered as occurring in the same cavity. The clustering also ensures that no member of a particular cluster occurs within 2.5 Å of a member of another cluster. After clustering, each cavity or pocket gets defined by a set of water molecules, often overlapping with each other, but not clashing with protein atoms, thus exactly defining the shape and size of the cavity.
To calculate the volumes of cavities, we use the Voronoi Procedure (5). Given a set of points in space, the Voronoi Procedure divides up space equally between the points by constructing a polyhedron around each point. We use the Radical Plane method to construct the Voronoi-polyhedra around each cavity water and thus calculate the volume.

References:

  1. Richards, F. M. (1974) J. Mol. Biol. 82, 1–14
  2. Chakravarty, S., Bhinge, A. and Varadarajan, R. (2002) J. Biol. Chem. 277, 31345-31353
  3. Saha, P., Barua, B., Bhattacharyya, S., Balamurali, M., M., Schief, W., R., Baker, D. and Varadarajan, R. (2011) Biochem. 50, 7891-790
  4. Nguyen, M., N., Tan, K., P. and Madhusudhan, M., S. (2011) Nucleic Acids Res. 39, W24-28
  5. Gerstein, M., Tsai, J., and Levitt, M. (1995) J. Mol. Biol. 249, 955–966

[back to top]