The Molecular and Genomic Basis of Phenotypic Innovations:
We are primarily concerned with the question of how molecular changes cause new phenotypes to emerge. Our main goal is to understand, from as many angles as possible, how protein evolution works. Furthermore, we want to understand which of the zillions of possible evolutionary paths at the molecular level are facilitated or hindered by biophysical constraints which govern structure and function of proteins and RNAs. Accordingly, we use computational simulations, transcriptomics (aka RNAseq), genomic data analyses and wet-lab directed evolution experiments to investigate the phenotypic effects of genotypic changes.
In particular, we aim at understanding evolvability (many mutations of a genotype create novel phenotypes), robustness (most mutations will preserve the phenotype), lock-in (the evolutionary history constrains the evolvability of a molecule), canalisation (the evolutionary history constrains phenotypic plasticity), epistatic ratchets (two or more neutral or deleterious mutations become beneficial in concert) and the role of promiscuity (multi-functionality) for escaping adaptive conflicts (a molecule has two (or more) fitness-relevant functions but can not optimise both simultaneously).
In order to confer adaptation to continuously changing environments, new protein-encoding genes are continuously exploring the available sequence space through mutation. However, potentially beneficial changes are vastly outnumbered by those that are detrimental. Therefore, any protein also has to be robust to mutation. The success of this balancing act between adaptability and the maintenance of general functionality ultimately determines which mutations are fixed during adaptive evolution. In other words: how do evolving genes (or their encoded proteins) change their dominant and/or latent function(s) without ever getting stuck in fitness minima?
Recent results illustrate the importance of sub-optimal or promiscuous functions for the adaptation toward new function of protein coding genes. Unfortunately, this renders any modelling of fitness landscapes (and therefore rational design) incredibly complicated. We use simple models systems to characterize the fitness landscape of molecules and predict their evolvability. In particular we investigate the effects of the degree of neutrality (the fraction of neutral mutations for a given genotype) in sequence space and of multi-functionality on evolvability, i.e. the number of new phenotypes that can be reached within a few mutations.
Furthermore, we use predicted ancestral states of enzymes, in particular those that occur at the point of functional divergence, to unravel their evolutionary potential and understand their functional switches during evolution. The latter should ultimately lead to the development of a so-called evolvability assessment, based on experimentally determined biochemical/biophysical characteristics, which can used to determine the best enzyme, out of several possible candidates, to use as a starting point for engineering a better protein for a particular (commercially valuable) function.
We assume that genes simultaneously optimized from two functions will only evolve if the fitness benefits of keeping both genes will outweigh the costs of reducing the fitness associated to one of the traits. If this is the case, subsequent gene duplication with sub-functionalization is likely to provide an additional advantage. We will use computational and experimental studies to test these premises and the influences of network connectivity on the shape and structure of the fitness landscape.
People: Bert van Loo, April Kleppe, Jasmin Kurafeiski, Berndjan Eenink
Funding: BBSRC (2002 - 2005), DAAD (2006 -- 2007), HFSP (2013 - 2017), EC Horizon 2020 ITN (2017-2020).
Techniques employed: computational: ancestor reconstruction, phylogenies, calculation of stability effects of mutations (e.g. FodX), simulations of population dynamics, disorder prediction; experimental: High throughput functional molecular screening using in-cell assays and "lab-on-a-chip" micro-droplets (Hollfelder group); experimental measuring of stability and structural dis-/order (CD); detection assays, cloning, (over-)expression, purification, SDS page, E.coli autodisplay (Jose), differential scanning fluorimetry to study unfolding / TD stability, ko-libraries such as ASKA (E.coli), detection assays.
Modularity is a hallmark of molecular evolution, whether considering gene regulation, the components of metabolic pathways or signalling cascades, the ability to reuse autonomous modules in different molecular contexts can expedite evolutionary innovation. Over times scales of several 100 MY, the evolution of protein coding genes is dominated by the modular rearrangements of protein domains, their evolutionary, structural and functional units. While a small core of arrangements is universal, a large fraction of multi-domain arrangements is species specific and has been created recently via gene duplication, fusion and terminal loss of domains. Surprisingly, thousands of domains are completely (i.e. all copies within a genome) lost from genomes along every lineage in a stochastic manner, i.e. at a fairly constant rate of ca. four domains per million years.
Novel domains are rarely fixed and arise either as their own genes or terminally, by extension of existing reading frames. Novel domains are under strong selection pressure and confer a strong fitness value as they rapidly attain high copy numbers within the genomes and are involved in biotic defence, reproduction and development.
Using cross-species genomic comparisons and population genomics we investigate the genetic mechanisms and biophysical constraints of domain emergence and develop algorithms for rapid screening of many genomes, understanding their phylogenies and comparing, aligning and clustering sequences. This is possible because of complexity reduction (sequences can be characterised as linear arrangements of 5-6 domains drawn from an alphabet of several thousand characters as opposed to ca. 500 amino acids drawn from an alphabet of 20) and the maintenance of linear order and the high reliability of HMMs which characterise the domains.
We test the potential of increased evolvability by rearranged domains experimentally in biochemical pathways, for the defence against pathogens and during developmental processes.
The software we develop is available here: https://domainworld.uni-muenster.de/
People involved: Carsten Kemena, Steffen Klasberg, Elias Dohmen.
Funding: Volkswagen Foundation (2 x, 2009 -- 2013); DFG (2009 -- 2013).
Techniques employed: Design Hidden Markov Models, suffix arrays, recursive dynamic programming,
Approximate Bayesian Computaing, parsimony and maximum likelihood, design and implementation of string algorithms, databases and interactive graphical interfaces; DnDs ratio tests (PAML); experimental (planned): see also P1.
With every new genome sequenced a couple of hundred proposed genes remain ''orphans'' because computational methods could not assign any orthologs, even to closely related and well annotated species. Presumably many of these (lineage-specific) genes are transcribed, sometimes translated and proteins functional and adaptive, at least under some (possibly unknown) conditions. De novo emergence is not only against current believe that most novel genes emerge from old ones, it is also difficult to reconcile with a biophysical perspective because novel reading frames emerging from previously non-coding matter must be considered extremely unlikely: they would most likely be disordered, aggregate and thus be deleterious or, at least be purged for purely energetic reasons. So, where do new coding genes actually come from, how do they function and how is their -- potentially detrimental -- expression regulated?
We ask where novel protein coding genes come from and how genomic novelties and rearrangements trigger adaptation and spur developmental transitions. Using comparative genomics and biophysical analyses (computational and experimental) we test their properties and functions. We found that most genetic novelty comes from novel domains but also many completely new reading frames emerge, e.g. across the insect tree, with an estimated frequency of 500 new genes in the wake of each speciation event. This former process has been termed ''grow slow and moult'' because some novel domains later lose their initially stabilising parent protein and become independent and amenable for further rearrangements.
We concentrate on some major transitions which happened during the development of extant life forms: signalling across multicellular organisms, placentation in mammals, the emergence of holometabolism in insects and the onset and reversal of ageing.
Furthermore, to catch novel genes "in the act" of emergence, we investigated genomes not only between species but also from populations and, as an outgroup, their closely related sister species. We determine, using gene and domain prediction programmes, novel ORFs, their expression (RNAseq) and, if necessary, confirm them e.g. with (long-read and primer walking) PCR and qPCR. We are currently screening several systems (populations of fish, mice, flies, and human) to achieve a good genomic coverage for detecting possible recent emergence and reconstruct ancestral sequences which can then be tested for their genetic origin and investigate their structural and biophysical properties with the help of TSA, CD, NMR, and phage display experiments. Additonally, we aim to examine the behavior of the predicted ancestoral de novo gene compared to the existing one in vitro Drosophila experiments.
People: Andreas Lange, Anna Granchamp, Brennen Heames, Daniel Dowling, Jonathan Schmitz, Steffen Klasberg.
Funding: Leibniz Gemeinschaft (2013 -- 2016);
Techniques employed: Computational: comparative genomics, differential GO analysis, biophysical predictions (disorder, secondary structure, hydrophobic clusters), ancestral reconstruction and phylogenies, mutational effects on stability (FoldX, Rosetta); experimental: deep sequencing, qPCR; antibody staining; cloning, (over-)expression, purification; SDS page expression quantification; E.coli autodisplay (Jose); in-situ hybridisation; CD; stability measures; in-cell NMR (Selenko); pull down assays (Ivarsson); in vitro expression of ancestral de novo genes (Findlay);
Sociality is considered to be one of the major transitions in evolution but only little of the underlying genomic basis and the associated selectable traits are known. Social insects are an excellent study object because their genomes are relatively simple to analyse and many speciation events gave rise to morphologically and ecologically diverse species within a relatively short time period. Furthermore, in insects, sociality has independently evolved at least twice, in hymenoptera (comprising ants and bees) and in termites. Both groups also show a striking reversal of the otherwise widely spread tradeoff between longevity and fecundity. Furthermore, loss and/or reversal of social behaviour has been observed since several ants parasitise or enslave the colonies of evolutionary very closely related species.
We have investigated genomes and transcriptomes of several insects, most of which are either social or closely related outgroups thereof. Using standard bioinformatic techniques and several of our in-house algorithms we could identify horizontally transfered genes from bacterial parasites, several genes under adaptation and novel genes which were instrumental for the ecological success of social insects. In comparing social with non-social insects, we found that both, novel genes and rewiring of regulatory networks, play a big role for the regulation of sociality. We also consider the epigentic marks and effects from methylation/acetylation and of small regulatory RNAs on the differentiation of individuals during their development to test the possible role of epigenetic marks in general and its effect on de novo genes and on rearranged genes.
People involved: Mark Harrison, Alice Seguret
Funding: DFG (2015 - 2019, 2017 - 2021)
Techniques employed: cuffsuite; 454, Illumina, PacBio, ONP; AllPath, SoapDeNovo, MIRA, Platanus, Spades, MaSuRCA; Maker; CEGMA, DOGMA; PorthoDOM, proteinOrtho; RaXML, CodeML/PAML and many others.
It has for long been assumed that the blueprint for an organism's blueprint lies entirely in its genome. Advances over the last decade have demonstrated an every increasing role of variation which occurs at the level of populations between almost identical individuals, between cells in the same tissue and so forth. Deciphering the roles of these variations has become increasingly important for pushing the limits of genomics further and for improving the understanding of how biological novelty arises.
We are using the methods and insights from the other project to understand the emergence of evolvability and robustness and creation and maintenance of genetic diversity from a perspective of protein coding genes, their duplications and rearrangements.