Matching unlabelled configurations and protein

John T. Kent, Kanti V. Mardia & Charles C. Taylor

Unlabelled shape analysis is a challenging and complex problem, with many applications. Consider two configurations of unlabelled "landmarks" in d-dimensions, and suppose that after some transformation, subsets of each set of landmarks can be closely aligned. The objective is to find the transformation and the corresponding pairing or matching function between the two subsets of landmarks. Using a regression approach, we formulate an injective labelling model. An approximate solution is obtained using a mixture model and inference is carried out using the EM algorithm together with integer programming. Starting values are obtained using a type of forward search algorithm. Applications are made to two problems in Bioinformatics. In the first the landmarks are the locations of amino acids in two proteins. In the second the landmarks are the locations of dark spots in in a planar image of two electrophoretic gels.

Keywords: Allocation; EM algorithm; Forward Search; Integer Programming; Mixture Model; Protein Gel; Shape Analysis; Transformation.