February 2011 January 2013: Post-Doctoral Research Associate at the Institute for Evolution and Biodiversity, in the Evolutionary Bioinformatics group of Erich Bornberg-Bauer - Westfälische Wilhelms University Münster.
Dec. 2010: Ph.D Degree in Computer Science (supervisors: Laurent Bréhélin and Olivier Gascuel; examiners: Daniel Khan, Jacques Nicolas, Nicolas Hulo and Éric Maréchal)
Topic of the Thesis: Searching for Divergent Protein Domains with Hidden Markov Models: Application to Plasmodium falciparum - University of Sciences Montpellier 2.
Oct. 2010 - Dec. 2010: CNRS Research Engineer at the Laboratory of Informatics, Robotics and Micro-Electronics of Montpellier (LIRMM), in the Methods and Algorithms for Bioinformatics (MAB) group leaded by Olivier Gascuel.
Sep. 2009 - Aug. 2010: Teaching Assistant (University of Arts Montpellier 3) - Ph.D studies in LIRMM, MAB Group.
Oct. 2008 - Aug. 2009: Teaching Assistant (ENSIMAG Grenoble - Top school of Computer Sciences and Mathematics) - Ph.D studies in the Laboratory of Cellular Physiology of Plants (LPCV) in the group of Éric Maréchal.
Oct. 2005 - Oct. 2008: Teaching Assistant (Technological Institute of Computer Sciences in Montpellier) - PhD Studies in LIRMM, MAB Group.
Jul. 2005: Master degree in Computer Science - University of Sciences Montpellier 2.
Specialities: Bioinformatics, Database Management, Machine Learning, Automatic processing of natural language and Distributed systems.
Jun. 2003: Licence degree in Computer Science - University of Sciences Montpellier 2.
Functional annotation of proteins
Database Management System
I am mainly interested in protein domains. I am notably studying the improvement of domain detection, the evolution of domain architectures and the functional annotation of proteins thanks to domains.
Protein domains are structural and functional subunits of proteins. As such, they have a key position in function prediction in bioinformatics. Nowadays, numerous protein domains databases can be found online (e.g. Interpro, Pfam, etc). These databases offers probabilistic models, mainly Hidden Markov Models (HMMs). The models are built from alignments of manually curated families of homologous domains. Given a new protein sequence, these models allow to establish the domain architecture of the protein.
However, there is a obvious limit in this approach. Many domains may be missed due to the high phylogenetic distance of their sequences in comparison with those used to building the models. My first researches thus consisted to overcome this limit by developping statistical and learning methods (Ph.D Thesis). On the one hand, my work involved the enhancing of domain detection thanks to co-occurrence properties of protein domains. My approach allowed to detect numerous new Pfam domains in Plasmodium falciparum with low error rates. It also brought new Gene Ontology (GO) annotations, notably by annotating domain groups. By applying the approach to ten other human pathogens, I ceated the EuPathDomains database. On the other hand, my researches consist in developing methods to correct probabilistic models, unsuitable to the compositional bias. I put forward different corrections based on numerical, evolutionary, statistical and taxonomic techniques.
I am now interested in the emergence of new protein domains and of new domain arrangements. This topic is a key concept in our team at the IEB. However, my main works during my post-doc are related to the future publication of the first genome sequence of a termite species.
Database / Tools / Software
EuPathDomains is an extended database of protein domains in several eukaryotic pathogens from EuPathDB
The EuPathDomains database gathers known Interpro domains occurrences and new Pfam domain occurrences found by the CODD procedure [Terrapon et al., 2009]. CODD improves the sensitivity of Pfam domain detection by exploiting the domain tendency to appear preferentially with a few other favorite domains in a protein. This property enables CODD to certify the presence of a divergent domain on the basis of the presence of another domain in the same protein.
The database contains domains for Giardia lamblia, Trypanosoma brucei, three Leishmania species, and five apicomplexan species including three Plasmodium species, Toxoplasma gondii and Cryptosporidium parvum
RADS Retrieve distant homologs based on alignments of the domain architectures instead of the usual amino-acid sequences. The webserver is a proof-of-concept of the RADS-RAMPAGE algorithms (publication submitted).
DoMosaics A powerful Java program for domain-based analysis and visualization of protein. This tool allows the annotation of protein domains, the visualization of architectures, but also provides several gadgets (like dotplot, tree computations) and statistics, and more importantly the possibility to study domain arrangement during the evolution.
Moore AD, Held A, Terrapon N, Weiner J and Bornberg-Bauer E.
DoMosaics: Software for domain arrangement visualization and domain-centric analysis of proteins.
Terrapon N*, Weiner J*, Grath S, Moore AD and Bornberg-Bauer E.
Rapid similarity search of proteins using alignments of domain arrangements
Nicolas Terrapon *, Cai Li*, Hugh M. Robertson, Lu Ji, Xuehong Meng, Warren Booth, Zhensheng Chen, Christopher P. Childers, Karl M. Glastad, Kaustubh Gokhale, Johannes Gowin, Wulfila Gronenberg, Russell A. Hermansen, Haofu Hu, Brendan G. Hunt, Ann Kathrin Huylmans , Sayed M. S. Khalil, Robert D. Mitchell, Monica C. Munoz-Torres, Julie A. Mustard, Hailin Pan, Justin T. Reese, Michael E. Scharf, Fengming Sun, Heiko Vogel, Jin Xiao, Wei Yang, Zhikai Yang, Zuoquan Yang, Jiajian Zhou, Jiwei Zhu, Colin S. Brent, Christine G. Elsik, Michael A. D. Goodisman, David A. Liberles, R. Michael Roe, Edward L. Vargo, Andreas Vilcinskas, Jun Wang, Erich Bornberg-Bauer , Judith Korb, Guojie Zhang, Jürgen Liebig
Molecular traces of alternative social organization in a termite genome
Nature Communications, 2014
Terrapon N, Grath S, Weiner J, Moore AD and Bornberg-Bauer E
Fast Homology Search Using Domain-Architecture Alignment
JOBIM, Conference proceedings, 2012
Terrapon N., Gascuel O., Maréchal É. and Bréhélin L.
Fitting hidden Markov models of protein domains to a target species: application to Plasmodium falciparum.
BMC Bioinformatics, 2012
Ghouila A.*, Terrapon N.*, Gascuel O., Guerfali F., Laouini D., Maréchal É. and Bréhélin L. (* first authors)
EuPathDomains : the divergent domain database for eukaryotic pathogens.
Infection, Genetics and Evolution, 2011
Terrapon N., Gascuel O., Maréchal É. and Bréhélin L.
Detection of New Protein Domains by Co-occurrence: Application to Plasmodium falciparum.
Journées Ouvertes en Biologie, Informatique et Mathématiques (JOBIM Paris 2011).
Proceedings - Oral Communication
Journées Ouvertes en Biologie, Informatique et Mathématiques (JOBIM Nantes 2009). Proceedings - Oral Communication