Bayesian refinement of protein functional site matching

Mardia, K.V., Green, P.J., Nyirongo, V.B., Gold, N.D. & Westhead, D.R.

Matching functional sites is a key problem for the understanding of protein function and evolution. The commonly used graph theoretic approach, and other related approaches, require adjustment of a matching distance threshold it a priori according to the noise in atomic positions. This is difficult to pre-determine when matching sites related by varying evolutionary distances and crystallographic precision. Furthermore, sometimes the graph method is unable to identify alternative but important solutions in the neighbourhood of the distance based solution because of strict distance constraints. We consider the Bayesian hierarchical approach to improve graph based solutions. In principal this approach applies to other methods with strict distance matching constraints.

We present a new meta-algorithm for matching protein functional sites (active sites and ligand binding sites) based on an initial graph matching followed by refinement using a Markov chain Monte Carlo (MCMC) procedure. This procedure is an extension to our very recent work. The method accounts for the 3D form of the site as well as the physico-chemical properties of the constituent amino acids. Computer routines in R are available at MCMC refinement step is able to significantly improve graph based matches. We apply the method to matching NAD(P)(H) binding sites within single Rossmann fold families, between different families in the same superfamily, and in different folds. Within families sites are often well conserved, but there are examples where significant shape based matches do not retain similar amino acid chemistry, indicating that even within families the same ligand may be bound using substantially different physico-chemistry. We also show that the procedure finds significant matches between binding sites for the same co-factor in different families and different folds.