Methodology
Selection Strategy
PepX was constructed from the Protein Data Bank. We filtered for protein-peptide complexes requiring
- X-Ray structures with a resolution lower than 2.5 Å
- peptides with a size from 5 to 35 amino acids
- peptides containing natural amino acids only
- receptors with a minimum size of 35 amino acids
- the first unit in the PDB in case of crystallographic symmetry
Peptide Definition
In PepX, we define a peptide as follows:
- Peptide chain is between 5 and 35 amino acids
- Peptide only contains natural amino acids and does not contain any metals involved in binding to the peptide chain(e.g. Zinc)
- The peptide is bound to a receptor of minimum 35 amino acids
- Disulfide bonds have been annotated for the peptide chain. Note that the presence of disulfide bonds in the peptide chain suggest that the chain probably adopts a fold in isolation and thus is rather not be classified as a "peptide" but instead as a "mini-protein". We are working towards a better definition of peptide not based on chain length alone. For the moment we only annotate any specialties like disulfide bonds.
Clustering Algorithm
All the protein-peptide complexes in PepX were clustered on their binding sites using Hierarchical Agglomerative Clustering, the same algorithm used to construct BriX. The distance matrix used in the clustering contains the RMSD values between any two protein-peptide binding sites. computed with Mustang.
Alignment
The Alignment value is used to express the % of the Binding Site of the protein-peptide complex that is used in clustering. The higher the alignment, the more of the binding site is used in clustering, and thus the more clusters there will be.
Threshold
The Threshold value is the maximum allowed Root Mean Square Distance or RMSD between two PDBs. The threshold value is expressed in Ångström or Å. For tighter clustering (generating more clusters), you need to choose a small value (eg 1 Å). If you need less clusters, choose a higher value (eg 2 Å).