|
|
Figure 1. Protein-peptide dataset. PepX contains 505 unique protein-peptide interface clusters from 1431 PDBs, representing the diversity of structural information on protein-peptide complexes available in the PDB. 47% of all protein-peptide complexes available from the PDB are clustered within only 10 classes, containing complexes with peptides bound to MHC, thrombins, α-ligand binding domains, SH3 domains, PDZ domains and other. |
|
Figure 2. Distribution of ligand size in the database as the percentage of complexes for each ligand length. The smallest ligand considered is 5 amino acids long, the longest consists of 35 residues. Circa 70% of all peptides lies within the [5-15] residue range. |
|
|
|
Figure 3. Distribution of receptor size in the database as the percentage of complexes for each receptor length. The largest protein in the complexes contains 2552 amino acid residues; the shortest considered is 35 residues long. Most proteins are smaller than 600 residues, with a peak in the [300-400] range. |
|
|
Figure 4. Receptor sequence redundancy within the PepX database for all complexes (blue) and the centroid set (red). The receptor sequences in the PepX database were clustered with the cd-hit algorithm for various thresholds of sequence identity, from removing identical sequences up to 40% sequence identity. Although there is large sequence redundancy within the database, this does not always reflect a redundancy in binding modes. For instance, removing only identical sequences (100%) results in a loss of more than 60% of all complexes and more than 20% of the centroids, showing that some receptors bind in different structural modes. |
|
Figure 5. Distribution of number of elements in the PepX clusters for various thresholds of structural similarity (1-2-3 Angstrom) and binding site alignment (50 % (A), 75% (B) and 95% (C)). For all settings the largest number of clusters contains only one complex, going from 63% of all clusters (S1A, 50% and 3Å) to 87% of all clusters (S1C, 95% and 1Å). |
|
|
Figure 6. General annotation statistics. Percentage of receptors in the PepX database reprented by different annotations: SCOP, CATH, Pfam and UniProt. Coverage is highest for UniProt (>80%), followed by structural classifications by CATH (ca 70%) and SCOP (ca 55%), and finally protein family annotation by Pfam (ca 50%). |
|
|
Figure 7. Population of the SCOP hierarchy with protein-peptide complexes. Although most SCOP classes are represented by receptors in the database, protein-peptide complexes do not represent the full range of SCOP folds, superfamilies and families. |
|
|
Figure 8. Distribution of structures in the different SCOP classes for the PepX database (A) and the full SCOP database (B). Whereas the α, β, α/β and α+β classes are of similar size in the full SCOP database, the all-β and α+β proteins are overrepresented in PepX.
|
|
|
Figure 9. Protein-peptide complexes in the CATH hierarchy. Every CATH class is represented by complexes, and architectures are highly represented as well (50%). In contrast, at lower CATH levels, less than 10% of both topologies and superfamilies hold at least one protein-peptide complex. |
|
|
Figure 10. Distribution of structures in the different CATH classes. In accordance with the SCOP classification, classes with mainly β-structures are largely overrepresented. Alpha and beta structures are underrepresented (35% in PepX versus 52% full CATH), which is also seen in SCOP when we merge the classes together (α/β and α+β), although the difference is smaller (43% PepX versus 49% full SCOP). |