Proteins are the workhorses of the cell and, over billions of years, they have evolved an amazing plethora of extremely diverse and versatile structures with equally diverse functions. Evolutionary emergence of new proteins and transitions
between existing ones are believed to be rare or even impossible. However, recent advances in comparative genomics have repeatedly called some 10-30% of all genes without any detectable similarity to existing proteins.
Even after careful scrutiny, some of those orphan genes contain protein coding reading frames with detectable transcription and translation. Thus some proteins seem to have emerged from previously non-coding "dark genomic matter". These "de novo" proteins tend to be disordered, fast evolving, weakly expressed but also rapidly assuming novel
and physiologically important functions.
Here we review mechanisms by which "de novo" proteins might be created, under which circumstances they may become fixed and why they are elusive. We propose a "grow slow and moult" model in which first a reading frame is extended, coding for an initially disordered and non-globular appendage which, over time, becomes more structured and may also become associated with other proteins.