Statistical Bioinformatics


 

 

 

 

Electrophorectic gel with annotated landmarks (see for example Green and Mardia, 2006)

 

 

 

 

 

 

 

 

“The lack of real contact between mathematics and biology is either a tragedy, a scandal or a challenge, it is hard to decide which”.

Gian-Carlo Rota, Discrete Thoughts (1986)

 

 

The increasing understanding of the human genome has been one of the great success stories in biology over the past 50 years. DNA in the genome contains "genes" which contain the information needed to build proteins, which in turn carry out the functions of life.

Many deep questions remain, and statistical ideas will play a key role in understanding genes and proteins. Some of the key problems in bioinformatics are:-

(i) measuring when different genes are expressed at different stages in a life cycle

(ii) measuring the differences between genes for different species.

The shape of a protein is one of the key properties which determines how a protein works. The advances in geometric statistics and particularly shape analysis from the LASR workshops enables us to explore new ways of modelling protein structure.

A key challenge in adapting shape analysis to protein bioinformatics is the lack of natural labelling on the proteins.

 

 

“At the end of last year, representatives from around 30 university research departments, supported by the BBSRC, met to discuss this issue *.  They concluded that action is needed urgently to prevent the UK losing its world-leading position in research as the biosciences become increasingly quantitative and predictive.”

* (Mathematical skill for biosciences students).

“Data Rich but Maths Poor”, Julia M. Goodfellow CBE, LASR Proceedings 2006, p99.

 

 

Bioinformatics of matching active sites through Bayesian hierarchical model, Bayesian refinement, high-dimensional directional distributions for Ramachandran plots, and, directional hidden Markov models for synthetic proteins.

 

Edited Volumes:

2003    Stochastic Geometry, Biological Structure and Images, co-editors R. G. Aykroyd and M. J. Langdon. LASR Proceedings: Leeds University Press.

2004    Bioinformatics, Images, and Wavelets, co-editors R. G. Aykroyd and S. Barber. LASR Proceedings: Leeds University Press.

2005    Quantitative Biology, Shape Analysis, and Wavelets, co-editors S. Barber, P. D. Baxter and R.E. Walls. LASR Proceedings: Leeds University Press.

2006    Interdisciplinary Statistics and Bioinformatics, co-editors S. Barber, P.D. Baxter and R.E. Walls. LASR Proceedings: Leeds University Press.

2007    Systems Biology & Statistical Bioinformatics, co-editors S. Barber and P.D. Baxter. LASR Proceedings: Leeds University Press.

2008    The Art & Science of Statistical Bioinformatics, co-editors S. Barber, P.D. Baxter and A. Gusnanto. LASR Proceedings: Leeds University Press.

Papers in Journals:

2006    Bayesian alignment using hierarchical models, with applications in protein bioinformatics (with P.J. Green). Biometrika 93, 235-254.

2007    Protein bioinformatics and mixtures of bivariate von Mises distributions for angular data (with C. C. Taylor and M. Subramaniam). Biometrics  63, 505-512.

2007    Bayesian refinement of protein functional site matching (with V. B. Nyirongo, P. J. Green, N. G. Gold, and D. R. Westhead). BMC Bioinformatics http://www.biomedcentral.com/1471-2105/8/257.

2007    The Poisson index: A new probabilistic model for protein-ligand binding site similarity. (with J. R. Davies, R. M. Jackson, and C. C. Taylor). Bioinformatics, 23, 3001-3008.

2008    A multivariate von Mises distribution with applications to bioinformatics. (with Hughes, G., Taylor, C.C. and Singh, H.). Canadian Journal of Statistics, 36, 99-109.

2008    Matching unlabelled configurations and protein bioinformatics (with J. T. Kent and C. C. Taylor). Submitted as a read paper for the half-day research meeting of the Royal Statistical Society on “Inference Methods for Complex and High-Dimensional Structural Systems”.

2008    A generative, probabilistic model of local protein structure”. (with Boomsma, W., Taylor, C.C., Ferkinghoff-Borg, J., Krogh, A. and Hamelryck, T.). Proceedings of the National Academy of Sciences, 105, 8932-8937.

2008    Simulating virtual protein $C_\alpha$ traces with applications (with Nyirongo, V.B.). Journal of Computational Biology, 15, pp1209-1220.

2008    Protein structure prediction using a probabilistic model of local structure”. (with Boomsma, W., Borg, M., Frellsen, J., Harder, T., Stovgaard, K., Ferkinghoof-Borg, J., Krogh, A., and Hamelryck, T.). 8th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction Cagliari, Sardinia, Italy December 3-7, 2008, pp82-83.

Papers in Edited Volumes:

2001    Statistics and molecular structure of biological macromolecules (with E. Demchuk, H. Singh, V. Hnizo, and D. S. Sharp). In LASR’20, pp 9-14, Leeds Univ. Press.

2003    Structural bioinformatics revisited (with C.C. Taylor and D.R. Westhead). LASR2003 Proceedings. (Editors: R. G. Aykroyd, K. V. Mardia & M. J. Langdon), Leeds University Press, 11-18.

2005    A vision of statistical bioinformatics. LASR2005 Proceedings. (Editors: S. Barber, P. D. Baxter, K.V. Mardia & R.E. Walls), Leeds University Press, pp.  9-20.

2005    Protein gels matching (with V. Patrangenaru, and S. Sugathadasa). LASR2005 Proceedings. (Editors: S. Barber, P. D. Baxter, K. V. Mardia & R. E. Walls), Leeds University Press, pp. 163-165.

2006    Modeling protein folds with a trivariate von Mises distribution (with G. Hughes, K. V. Mardia, and C. C. Taylor). LASR2006 Proceedings. (Editors: S. Barber, P. D. Baxter, K. V. Mardia and R. E. Walls), Leeds University Press, pp120-123.

2006    Matching pesticides to proteins to predict toxicity (with E.M. Petty, C. C. Taylor, and Q. Chaudhry). LASR2006 Proceedings. (Editors: S. Barber, P.D. Baxter, K.V. Mardia and R.E. Walls), Leeds University Press, pp150-153.

2006    Graphical models and directional statistics capture protein structure” (with W. Boomsma, J. T. Kent, C. C. Taylor, and T. Hamelryck). LASR2006 Proceedings. (Editors: S. Barber, P. D. Baxter, K. V. Mardia and R. E. Walls), Leeds University Press, pp. 91-94.

2007    Statistical modelling of globular proteins. (with V.B. Nyirongo). LASR2007 Proceedings. (Editors: S. Barber, P. D. Baxter, and K. V. Mardia), Leeds University Press, p. 120.

2007    On some recent advancements in applied shape analysis and directional statistics. LASR2007 Proceedings. (Editors: S. Barber, P. D. Baxter, and K. V. Mardia), Leeds University Press, pp. 9-17.