School of Chemistry & Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072. Australia
While sequences in protein super-families can be remarkably diverse, even the most evolutionarily distant variants retain structural features. Analyses of remote homologues must deal with this sequence diversity. We hypothesize that ancestral reconstruction may benefit from the guidance of a model of structural evolutionary change, which explains the sway evolution has on structure. Dayhoff's point accepted mutation matrix expresses what amino acid substitutions are accepted by natural selection. We are now at a stage when a sufficient number of protein structures have been determined to develop structural 1-PAM matrices. Here, we use the PDB and two well-known structural descriptors, secondary structure and relative solvent accessibility, to develop such matrices. We compiled clusters of proteins with sequence identity higher than 85%, resulting in to 592 proteins, falling into 75 clusters. On the basis of the same 592/75 dataset, we calculated the 1-PAM matrices for amino acids, three- and seven-class secondary structure, and two-class relative solvent accessibility. Reassuringly, the correlation between the 592/75 amino acid 1-PAM matrix and the Dayhoff matrix is high at 0.89, which suggests that the structural 1-PAM matrices also capture evolutionary change on par with the interval of the amino acid 1-PAM. The following results stand out. First, substitutions between helix and beta-sheet are improbable according to 1-PAM, but probabilities range 0.002-0.017 for substitutions between bend or turn, helix, and beta-sheet. The seven-class secondary structure matrix reveals non-obvious tendencies between uncommon secondary structure classes; at a longer timeframe, 250-PAM reveals that helix and beta-sheet substitute at 15% probability. Second, exposed and buried status changes with 0.006 and 0.03 probability, respectively, in-line with the general observation that exposedamino acids mutate freely, but the exposed status itself is conserved. We are currently using these structural evolutionary models to improve sensitivity in the analysis of families that employ structurally common but sequence-diverse interaction interfaces, and in the reconstruction of ancestral proteins dating back a billion years.