🐦 The IBIS Challenge

Community Article Published April 6, 2024

Join the IBIS Challenge: an open competition in Inferring and predicting transcription factor Binding Specificities.

Deciphering human gene regulation is a cornerstone of modern molecular biology and biomedicine. On the regulatory sequence level, the grammar of the gene regulation is defined by the binding specificities of special proteins, the transcription factors, which act at particular "genomic addresses" by recognizing 🧬 DNA sequence patterns in gene regulatory regions. Drawing inspiration from DREAM and Kaggle competitions, we invite you to join IBIS (ibis.autosome.org), an open challenge in computational sequence analysis for Inferring Binding Specificities of human transcription factors with classic bioinformatics and advanced machine learning (ML).

image/png

IBIS aims at a fair assessment of existing and novel methods solving the long-standing problem of DNA motif discovery: identifying and modeling recurrent DNA text patterns recognized by human transcription factors. In IBIS, we will assess classic methods as well as diverse modern approaches of arbitrary complexity.

🚀 Those include but are not limited to decision trees on top of k-mer frequencies, hidden markov models (HMMs), convolutional neural networks (CNNs), recurrent neural networks (RNNs), long short-term memory (LSTM) models, as well as attention and transformer-based models.

💡 IBIS allows arbitrary usage of the human genome or random DNA sequences to pre-train an artificial neural network or to extract features if performed from scratch. Particularly we allow using:

  • hg38 human genome assembly (including any existing DNA language models pretrained solely on the genome sequence);
  • precomputed biophysical features derived from the DNA sequences, such as the DNA shape features;
  • RepeatMasker track;
  • protein-level metadata on transcription factors (including but not limited to protein sequence and domain information) available directly in UniProt (so, in theory, you can demonstrate the power of pre-training on protein sequences). More details can be found in the IBIS documentation.

image/png

📊 To solve the challenge problem, IBIS provides a diverse array of unpublished experimental data on 40 human regulatory proteins, many of which remain unexplored in terms of preferred DNA binding patterns.

The challenge proceeds in two stages: the online Leaderboard (10 transcription factors) and the offline Final round (the remaining 30 transcription factors). Winners will be announced separately for each of the stages. 🏆 The best methods of both stages will be highlighted in the post-challenge high-impact scientific paper, while the winners of the Primary track of the Final round will be invited to contribute as co-authors.

image/png

image/png