Dirichlet process mixture model with applications to bioinformatics

Kanti V. Mardia, John T. Kent, Zhengzheng Zhang and Charles C. Taylor

Motivated by examples in protein bioinformatics, we study a mixture model of multivariate angular distributions. The distribution treated here (multivariate sine distribution) is a multivariate extension of the well-known von Mises distribution on the circle. The sine distribution has an intractable normalizing constant and here we propose to replace it in the concentrated case by a simple approximation. We study a Dirichlet process mixture (DPM) model of concentrated sine distributions and apply it to practical examples from protein bioinformatics. The various issues arising from the DPM model, e.g. the `label switching' issue, are discussed.