The University of Queensland
With increasing amounts of sequencing data becoming widely available, tools are required that can transform the linear DNA sequence into biologically meaningful information. Ancestral sequence reconstruction (ASR), which connects homologous proteins and resurrects their common ancestors, is now considered as a promising tool in protein engineering to generate novel enzymes. However current ASR methods are limited to inferring ancestral proteins for protein families of restricted size (several hundred sequences) and fail when used for protein superfamilies with low sequence identity. We are developing methods that reconstruct ancestral populations, rather than the single most likely ancestral sequence as produced by current ASR tools. We also deal with two significant limitations of current ASR methods for diverse protein families: obtaining a robust multiple sequence alignment and the uncertainty of phylogenetic analysis. These problems are being addressed by generating partial order graphs that sidestep some issues introduced by the placement of gaps and by using multiple phylogenetic trees that collectively explain the distances between members. To illustrate and test these methods, we have resurrected the ancestors of xenobiotic-metabolising cytochrome P450 enzymes, the extant forms of which are responsible for the metabolic clearance of ~90% of all drugs in humans. Several ancestral proteins of different evolutionary age were inferred and reconstructed from the CYP3 and CYP2 families and shown to be more thermostable than their respective descendants. The vertebrate CYP2 and CYP3 ancestors showed the ability to bind a similar range of ligands of diverse size and structure compared to the cognate extant P450s suggesting a comparable degree of substrate promiscuity. We hypothesise that the structural robustness of the ancestral P450 fold facilitated the diversification of these enzymes during evolution.