Comparative genomics has become an important strategy in many research areas of life sciences.
While many, especially well conserved, genes and the proteins they code for can be well characterized by assigning orthologs, a significant amount of proteins or their domains remain obscure "orphans".
Whereas some orphans are simply overlooked by current computational methods because they are rapidly diverging, others are de novo, i.e. they emerged relatively recently from previously non-coding regions on the DNA. Recent research has demonstrated the importance of orphans and, in particular, of de novo proteins and domains for development of new phenotypic traits and for adaptation. Therefore, new approaches for detecting novel domains are of paramount importance.
The hydrophobic cluster analysis (HCA) method, allows bypassing some of the limitations of established methods based on alignments and profile search methods. In the presented study, HCA is tested for the detection of orphan domains on the 12 Drosophila genomes. Detected orphan domains are classified into two categories, depending on their presence or absence in distantly related species. The two categories show significantly different physico-chemical properties when compared to previously characterized domains from the Pfam database. The newly detected domains have a higher degree of conserved intrinsic disorder than those already present in the Pfam database. Newly detected domains also have a particular composition in hydrophobic clusters, and results indicate that, the older domains are, the more similar their hydrophobic cluster content is to the cluster content of extant domains from the Pfam database. Taken together, results indicate that, over time, newly created domains acquire a canonical set of hydrophobic clusters but conserve some intrinsically ordered regions.
These results are in agreement with previous findings on orphan domains and suggest that the physico-chemical properties of domains change over evolutionary long time scale following some coherent pattern. Since the presented HCA based method is able to detect even protein domains with such unusual properties but does not rely on prior knowledge, such as the availability of homologs to build HMMs, it has large potential for complementing existing strategies for annotating novel genomes and for better understanding of how novel molecular features emerge.