Orthologous protein detection software mostly uses pairwise comparisons of amino acid sequences to assert if two proteins are orthologous or not. Accordingly, when the number of sequences to compare increases, the number of comparisons needing to be computed grows in a quadratic order. The circumvention of the increasing number of comparisons, by using innovative methodologies, is a current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available. We propose to investigate the detection of orthologous proteins from a new perspective based on our previous experiences with solving computationally demanding tasks by using strings of domains to characterise proteins.
We present two new protein similarity measures, a cosine and a maximal weight matching score based on domain arrangements, and new software, named porthoDA. The qualities of the cosine and the maximal weight matching similarity measures are compared against curated datasets. The measures show that domain arrangement similarities are able to correctly group proteins into their families. Accordingly, the cosine similarity measure is used inside porthoDA, the wrapper developed for proteinortho. PorthoDA makes use of domain arrangement similarity measures to group proteins together before searching for orthologs. By using domain arrangements instead of amino acid sequences, the reduction of the search space decreases the computational complexity of the all-against-all sequence comparison.
We demonstrated that representing and comparing proteins as strings of discrete domains, i.e. as a concatenation of their unique identifiers, allows a drastic simplification of search space. PorthoDA has the advantage of speeding up orthology detection while maintaining a degree of accuracy similar to proteinortho.
The implementation of porthoDA is released using python and C++ languages and is available for free under the GNU GPL licence 3 at http://www.bornberglab.org/pages/porthodom.