RFDiffusion Potentials

Community Article Published April 14, 2024

TLDR: This is an introduction to using "guiding potentials" with RFDiffusion writtent primarily for those already somewhat comfortable with using RFDiffusion. Mathematical descriptions are provided, along with recommended settings for the potentials, and example CLI commands.

Introduction

RFDiffusion is a Denoising Diffusion Probabilistic Model, often simply called a "DDPM" or "Diffusion Model" which can generate 3D protein backbones, de novo, either unconditionally or conditioned on various kinds of constraints. RFDiffusion is quite a powerful tool if you know how to use it and especially if you know how to use it in conjunction with several other models such as LigandMPNN, RoseTTAFold, RoseTTAFold All Atom, AlphaFold-Multimer, etc. The model is capable of multiple inetresting functionalities such as unconditional generation of monomers, unconditional generation of symmetric oligomers, motif scaffolding, partial diffusion for refining protein structures, sequence inpainting, binder design, and binder design with motif scaffolding.

The model can also be conditioned on various potentials. With RFDiffusion, we can for example, easily generate mutliple high affinity and high specificity binders to almost any target protein of interest. A good first example for users to try would be to design binder for the PD-L1 protein in order to disrupt its interaction with the PD-1 protein. This is an interaction which effectively turns off immune cells and prevents them from attacking cancer cells, so disrupting this by designing a high affinity and high specificity binder for PD-L1 is one useful approach to treating various kinds of cancer.

We might also use RFDiffusion, in conjunction with a tool like ALphaFold-Multimer, to graft various motifs from different proteins and design things like adjuvants. There is also a version of RFDiffusion called RFDiffusion All Atom, which can design proteins with binding pockets with high shape complementarity to any small molecule target. This can also be combined with motif scaffolding to generate proteins with desired functional motifs in addition to the desired binding pocket for the small molecule target of interest.

In this post you will find an introduction to using potentials with RFDiffusion. Such an introduction has yet to be written, and using potentials is something of an art form which is at present only known to a select few researchers who have invested many hours into honing their skills using this model; so hopefully this proves to be a helpful reference. Potentials are inspired by and taken from Molecular Dynamics simulation, often referred to simply as "MD". These potentials can be ommitted if you are just getting started with RFDiffusion and aren't strictly necessary for using the model. This post is more for the benefit of more advanced users who are now comfortable with all of the basics of RFDiffusion and who wish to improve their results and exercise more fine-grained control over the output of the model.

Ideally, at least some of these potentials can be transferred to the new RFDiffusion All Atom, or simply "RFDiffusion-AA" which was recently released. So, with a little work, you should be able to apply some of this information to RFDiffusion-AA as well. This post is a summary of what I have been able to piece together based on the RFDiffusion codebase and the documentation therein, along with the information from the Plumed v2.7 documentation referenced in the file potentials.py on lines 122, 154, and 193. See also this Plumed reference.

The Mathematics Describing the Various Potentials

monomer_ROG and binder_ROG:
- Equation: $ROG = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (\mathbf{r}_i - \mathbf{r}_{cm})^2}$ where $N$ is the number of Cα atoms, $\mathbf{r}_i$ is the position of the $i$ -th Cα atom, and $\mathbf{r}_{cm}$ is the centroid (center of mass) of the Cα atoms.
- The potential is defined as: $V_{ROG} = -w \cdot ROG$ where $w$ is the weight parameter.
dimer_ROG:
- Equation: $ROG_{m1} = \sqrt{\frac{1}{N_{m1}} \sum_{i=1}^{N_{m1}} (\mathbf{r}_i - \mathbf{r}_{cm,m1})^2}$ $ROG_{m2} = \sqrt{\frac{1}{N_{m2}} \sum_{i=1}^{N_{m2}} (\mathbf{r}_i - \mathbf{r}_{cm,m2})^2}$ where $ROG_{m1}$ and $ROG_{m2}$ are the radii of gyration of monomers 1 and 2, respectively, and $\mathbf{r}_{cm,m1}$ and $\mathbf{r}_{cm,m2}$ are the centroids of each monomer.
- The potential is defined as: $V_{dimer\_ROG} = -w \cdot \frac{ROG_{m1} + ROG_{m2}}{2}$ where $w$ is the weight parameter.
binder_ncontacts and interface_ncontacts:
- Equation: $s(r_{ij}) = \frac{1 - \left(\frac{r_{ij} - d_0}{r_0}\right)^n}{1 - \left(\frac{r_{ij} - d_0}{r_0}\right)^m}$ where $r_{ij}$ is the distance between atoms $i$ and $j$ , $d_{0}$ and $r_{0}$ are distance thresholds, and $n$ and $m$ are exponents (default $n = 6$ , $m = 12$ ).
- The potential is defined as: $V_{contacts} = w \cdot \sum_{i=1}^{N} \sum_{j=1}^{N} s(r_{ij})$ where $w$ is the weight parameter.
monomer_contacts:
- Equation: $s(r_{ij}) = \frac{1 - \left(\frac{r_{ij} - d_0}{r_0}\right)^n}{1 - \left(\frac{r_{ij} - d_0}{r_0}\right)^m}$ where $r_{ij}$ is the distance between atoms $i$ and $j$ , $d_{0}$ and $r_{0}$ are distance thresholds, and $n$ and $m$ are exponents (default $n = 6$ , $m = 12$ ).
- The potential is defined as: $V_{monomer\_contacts} = w \cdot \sum_{i=1}^{N} \sum_{j=1}^{N} s(r_{ij})$ where $w$ is the weight parameter.
olig_contacts:
- Equation: $s(r_{ab}^{ij}) = \frac{1 - \left(\frac{r_{ab}^{ij} - d_0}{r_0}\right)^n}{1 - \left(\frac{r_{ab}^{ij} - d_0}{r_0}\right)^m}$ where $r_{ab}^{ij}$ is the distance between residue $a$ in chain $i$ and residue $b$ in chain $j$ , $d_{0}$ and $r_{0}$ are distance thresholds, and $n$ and $m$ are exponents (default $n = 6$ , $m = 12$ ).
- The potential is defined as: $V_{olig} = \sum_{i=1}^{N_c} \sum_{j=1}^{N_c} C_{ij} W_{ij} \sum_{a=1}^{N_r} \sum_{b=1}^{N_r} s(r_{ab}^{ij})$ where $N_{c}$ is the number of chains, $N_{r}$ is the number of residues per chain, $C_{ij}$ is the contact matrix entry for chains $i$ and $j$ , and $W_{ij}$ is the weight factor (weight_intra for intra-chain contacts, weight_inter for inter-chain contacts).
substrate_contacts:
- Equation:
  - Attractive term: $c(d) = -\frac{1 - \left(\frac{d - d_0}{r_0}\right)^6}{1 - \left(\frac{d - d_0}{r_0}\right)^{12}}$ where $d$ is the minimum distance between a protein atom and any substrate atom, $d_{0}$ and $r_{0}$ are distance thresholds.
  - Repulsive term: $p(d) = \begin{cases} a |r - d|^{1.5},& \text{if } d < r \\ 0, & \text{otherwise} \end{cases}$ where $a$ and $r$ are parameters controlling the strength and distance scale of the repulsion.
- The potential is defined as: $V_{sub} = -w \cdot \sum_{i=1}^{N_p} \left[ s \cdot c(d_i) + p(d_i) \right]$ where $w$ is the weight parameter, $N_{p}$ is the number of protein Cα atoms, $d_{i}$ is the minimum distance between protein atom $i$ and any substrate atom, $s$ is a scaling factor for the attractive term.

These equations provide a mathematical description of the potentials implemented in the potentials.py file in the RFDiffusion codebase. The potentials are designed to encourage specific structural properties, such as compactness (monomer_ROG, binder_ROG, dimer_ROG), contacts within a monomer or between monomers (binder_ncontacts, interface_ncontacts, monomer_contacts), contacts in symmetric oligomers (olig_contacts), and protein-substrate interactions (substrate_contacts).

Potentials for Unconditional Generation

`monomer_ROG`

Overview: The monomer_ROG potential is used for unconditional generation of monomeric proteins. It encourages the model to generate compact structures by minimizing the radius of gyration (ROG) of the Cα atoms. A smaller ROG indicates a more compact structure.

Technical Details: The ROG is calculated as the root mean square distance of the Cα atoms from their center of mass. The potential returns the negative of the ROG, scaled by a weight factor. Minimizing the negative ROG is equivalent to maximizing the compactness of the structure.

The key settings for this potential are:

weight: This parameter scales the overall strength of the potential. A higher weight will prioritize compactness more strongly, potentially at the expense of other structural features. The recommended range is 1-10, but the optimal value may depend on the specific protein and the other potentials being used.
min_dist: This parameter sets a minimum distance constraint between each Cα atom and the center of mass. It prevents the structure from collapsing too much and becoming unrealistic. The recommended range is 10-20 Å, based on the typical size of monomeric proteins. Setting min_dist too low may allow overly compact structures, while setting it too high may prevent the potential from having any effect.

`monomer_contacts`

Overview: The monomer_contacts potential encourages the formation of contacts within a monomeric protein. It uses a differentiable contact definition based on a smooth switching function that goes from 1 (in contact) to 0 (not in contact) as the distance between atoms increases. Maximizing the sum of the switching function over all atom pairs favors structures with more contacts, which are generally more stable and well-packed.

Technical Details: The switching function is defined as: $s(r_{ij}) = \frac{1 - \left(\frac{r_{ij} - d_0}{r_0}\right)^n}{1 - \left(\frac{r_{ij} - d_0}{r_0}\right)^m}$

where $r_{ij}$ is the distance between atoms $i$ and $j$ , $d_{0}$ and $r_{0}$ are distance thresholds, and $n$ and $m$ are exponents controlling the sharpness of the transition (default $n = 6$ , $m = 12$ ).

The key settings for this potential are:

weight: Scales the overall strength of the potential. A higher weight will prioritize the formation of more contacts, potentially at the expense of other structural features. The recommended range is 1-10.
r_0: This parameter controls the distance at which the switching function transitions from 1 (in contact) to 0 (not in contact). It should be set based on the desired definition of a contact. A smaller value (e.g. 6 Å) will only consider very close atoms to be in contact, while a larger value (e.g. 12 Å) will include more distant interactions. The recommended range is 6-12 Å.
d_0: This parameter sets the distance below which the switching function is always 1 (definitely in contact). It should be smaller than r_0. The recommended range is 2-6 Å.

Potentials for Symmetric Unconditional Generation

`olig_contacts`

Overview: The olig_contacts potential is designed for generating symmetric oligomeric proteins. It allows for specifying attractive, repulsive, or neutral interactions between different chains of the oligomer, enabling control over the overall topology and inter-chain contacts.

The potential calculates the sum of pairwise contact energies between Cα atoms in different chains, using a smooth contact definition similar to the monomer_contacts potential. The strength and sign of the interaction between each pair of chains is determined by a contact matrix provided by the user.

Technical Details: The key settings for this potential are:

weight_intra: Scales the strength of intra-chain contacts (interactions within each chain). A higher value will prioritize compactness of individual chains. The recommended range is 0.1-2.
weight_inter: Scales the strength of inter-chain contacts (interactions between different chains). A higher value will prioritize contacts at the inter-chain interfaces. The recommended range is 0.1-2.
contact_matrix: A square matrix specifying the desired interactions between pairs of chains. Each element of the matrix should be 1 (attractive), -1 (repulsive), or 0 (no interaction). The matrix should be symmetric, as interactions are pairwise. The dimensions of the matrix determine the number of chains in the oligomer.
olig_intra_all, olig_inter_all: Binary flags indicating whether to apply the contact potential to all intra-chain or inter-chain pairs, respectively. If set to True, the corresponding elements of the contact matrix will be overridden.
olig_custom_contact: A string specifying custom contact definitions for specific chain pairs. Each definition should be of the form "A&B" (attractive) or "A!B" (repulsive), where A and B are chain identifiers. Multiple definitions are separated by commas.
r_0, d_0: Distance thresholds for the switching function, similar to the monomer_contacts potential. The recommended ranges are 6-12 Å for r_0 and 2-6 Å for d_0.

The olig_contacts potential provides a flexible way to control the topology of symmetric oligomers. By carefully designing the contact matrix and adjusting the interaction weights, a wide range of geometries can be generated, from cyclic and dihedral symmetries to more complex arrangements. The potential can also be used in combination with other potentials, such as monomer_ROG, to simultaneously control the compactness and inter-chain interactions of the oligomer.

Potentials for Motif Scaffolding

`substrate_contacts`

Overview: The substrate_contacts potential is used for motif scaffolding, where the goal is to design a protein structure around a given functional motif. The potential mimics interactions between the designed protein and a virtual substrate or ligand, encouraging the formation of a binding site that accommodates the motif.

The potential consists of two main components: an attractive term that favors contacts between protein and substrate atoms, and a repulsive term that prevents clashes. The substrate is defined by a set of atoms whose coordinates are provided relative to a reference frame in the designed protein. During the design process, the substrate coordinates are transformed based on the current protein conformation, allowing induced-fit-like adjustments.

Technical Details: The attractive component of the potential is defined using a smooth contact function similar to the monomer_contacts potential: $c(d) = -\frac{1 - \left(\frac{d - d_0}{r_0}\right)^6}{1 - \left(\frac{d - d_0}{r_0}\right)^{12}}$

where $d$ is the distance between a protein atom and a substrate atom, and $d_{0}$ and $r_{0}$ are distance thresholds.

The repulsive component is a soft polynomial potential: $p(d) = \begin{cases} a |r - d|^{1.5},& \text{if } d < r \\ 0, & \text{otherwise} \end{cases}$

where $a$ and $r$ are parameters controlling the strength and distance scale of the repulsion.

The key settings for this potential are:

weight: Scales the overall strength of the potential. A higher weight will prioritize protein-substrate interactions more strongly, potentially at the expense of other structural features. The recommended range is 1-10.
r_0, d_0: Control the distance dependence of the attractive contact term, similar to the monomer_contacts potential. The recommended ranges are 6-12 Å for r_0 and 2-6 Å for d_0.
s: Scaling factor for the attractive contact term. A higher value will strengthen the attraction between protein and substrate atoms. The recommended range is 0.1-2.
rep_r_0: Distance threshold for the repulsive term. Protein-substrate atom pairs closer than this distance will experience a repulsive force. The recommended range is 2-6 Å.
rep_s: Strength of the repulsive term. A higher value will increase the magnitude of the repulsion at short distances. The recommended range is 1-10.
rep_r_min: Minimum distance for the repulsive term. If specified, the repulsive force will only be applied for distances greater than this value, allowing some overlap between protein and substrate atoms. The recommended range is 1-3 Å.

To use the substrate_contacts potential, the user must provide the coordinates of the substrate atoms relative to a reference frame in the protein (e.g. the Cα coordinates of a set of residues). The potential will then transform the substrate coordinates based on the current protein conformation and calculate the interaction energy.

The substrate_contacts potential allows for flexible motif scaffolding by defining a virtual binding site that adapts to the designed protein structure. By adjusting the potential settings, the user can control the strength and specificity of the protein-substrate interactions, as well as the degree of allowed overlap between atoms. The potential can be combined with other potentials, such as monomer_ROG and monomer_contacts, to generate physically realistic scaffolds that accommodate the desired motif.

Example Usage:

`monomer_ROG`

./scripts/run_inference.py \
    'contigmap.contigs=[100]' \
    inference.output_prefix=outputs/monomer_rog_example \
    inference.num_designs=10 \
    potentials.guiding_potentials=["type:monomer_ROG,weight:5,min_dist:15"]

This command will generate 10 designs of a 100-residue monomeric protein using the monomer_ROG potential. The potential will be applied with a weight of 5 and a minimum distance of 15 Å between each Cα atom and the center of mass. The output structures will be saved in the outputs/monomer_rog_example directory.

`monomer_contacts`


./scripts/run_inference.py \
    'contigmap.contigs=[100]' \
    inference.output_prefix=outputs/monomer_contacts_example \
    inference.num_designs=10 \
    potentials.guiding_potentials=["type:monomer_contacts,weight:2,r_0:8,d_0:4"]

This command will generate 10 designs of a 100-residue monomeric protein using the monomer_contacts potential. The potential will be applied with a weight of 2, an $r_{0}$ value of 8 Å, and a $d_{0}$ value of 4 Å. The output structures will be saved in the outputs/monomer_contacts_example directory.

`olig_contacts`


./scripts/run_inference.py \
    --config-name symmetry \
    inference.symmetry=c3 \
    'contigmap.contigs=[120]' \
    inference.output_prefix=outputs/olig_contacts_example \
    inference.num_designs=10 \
    potentials.guiding_potentials=["type:olig_contacts,weight_intra:1,weight_inter:0.5"] \
    potentials.olig_intra_all=True \
    potentials.olig_inter_all=False \
    potentials.olig_custom_contact="A&B,A!C"

This command will generate 10 designs of a 120-residue C3-symmetric oligomer using the olig_contacts potential. The potential will be applied with an intra-chain weight of 1 and an inter-chain weight of 0.5. All intra-chain contacts will be attractive, while specific inter-chain contacts will be defined using the custom contact string "A&B,A!C" (attractive between chains A and B, repulsive between chains A and C). The output structures will be saved in the outputs/olig_contacts_example directory.

`substrate_contacts`


./scripts/run_inference.py \
    'contigmap.contigs=[10-20/A10-30/50-60]' \
    inference.input_pdb=motifs/example_motif.pdb \
    inference.output_prefix=outputs/substrate_contacts_example \
    inference.num_designs=10 \
    potentials.guiding_potentials=["type:substrate_contacts,weight:3,s:1,r_0:10,d_0:6,rep_r_0:4,rep_s:2"]

This command will generate 10 designs of a protein scaffold for the motif provided in motifs/example_motif.pdb. The scaffold will consist of 10-20 residues N-terminal to the motif, the motif itself (residues 10-30 of chain A), and 50-60 residues C-terminal to the motif. The substrate_contacts potential will be applied with a weight of 3, an attraction strength s of 1, an $r_{0}$ value of 10 Å, a $d_{0}$ value of 6 Å, a repulsive distance threshold rep_r_0 of 4 Å, and a repulsive strength rep_s of 2. The output structures will be saved in the outputs/substrate_contacts_example directory.

Note: The substrate_contacts potential requires additional setup in the code to define the position of the substrate atoms relative to the motif residues. This setup is not shown in the CLI example.

These examples demonstrate how to use the different potentials via the RFDiffusion command line interface. The specific settings, such as weights and distance thresholds, can be adjusted based on the requirements of the design task. Multiple potentials can also be combined by specifying them as separate strings in the potentials.guiding_potentials list.

Binder Design

Potentials for Binder Design

`binder_ROG`

Overview: The binder_ROG potential is used for designing protein binders. It encourages the model to generate compact structures for the binder region by minimizing the radius of gyration (ROG) of the Cα atoms in the binder. A smaller ROG indicates a more compact binder structure.

Technical Details: The ROG is calculated as the root mean square distance of the Cα atoms in the binder from their center of mass. The potential returns the negative of the ROG, scaled by a weight factor. Minimizing the negative ROG is equivalent to maximizing the compactness of the binder structure.

The key settings for this potential are:

binderlen: The number of residues in the binder region. This is used to extract the Cα coordinates of the binder from the overall structure.
weight: This parameter scales the overall strength of the potential. A higher weight will prioritize compactness of the binder more strongly, potentially at the expense of other structural features. The recommended range is 1-10, but the optimal value may depend on the specific binder and the other potentials being used.
min_dist: This parameter sets a minimum distance constraint between each Cα atom in the binder and the center of mass. It prevents the binder from collapsing too much and becoming unrealistic. The recommended range is 10-20 Å, based on the typical size of protein binders. Setting min_dist too low may allow overly compact binders, while setting it too high may prevent the potential from having any effect.

`binder_ncontacts`/`interface_ncontacts`

Overview: The binder_ncontacts and interface_ncontacts potentials encourage the formation of contacts within the binder region and at the interface between the binder and the target protein, respectively. They use a differentiable contact definition based on a smooth switching function that goes from 1 (in contact) to 0 (not in contact) as the distance between atoms increases. Maximizing the sum of the switching function over the relevant atom pairs favors structures with more contacts, which can contribute to the stability and specificity of the binder-target interaction.

Technical Details: The switching function is defined as: $s(r_{ij}) = \frac{1 - \left(\frac{r_{ij} - d_0}{r_0}\right)^n}{1 - \left(\frac{r_{ij} - d_0}{r_0}\right)^m}$

The key settings for these potentials are:

binderlen: The number of residues in the binder region. This is used to extract the Cα coordinates of the binder and target from the overall structure.
weight: Scales the overall strength of the potential. A higher weight will prioritize the formation of more contacts, potentially at the expense of other structural features. The recommended range is 1-10.
r_0: This parameter controls the distance at which the switching function transitions from 1 (in contact) to 0 (not in contact). It should be set based on the desired definition of a contact. A smaller value (e.g. 6 Å) will only consider very close atoms to be in contact, while a larger value (e.g. 12 Å) will include more distant interactions. The recommended range is 6-12 Å.
d_0: This parameter sets the distance below which the switching function is always 1 (definitely in contact). It should be smaller than r_0. The recommended range is 2-6 Å.

Example Usage:

`binder_ROG`

./scripts/run_inference.py \
    'contigmap.contigs=[B1-100/0 100]' \
    inference.output_prefix=outputs/binder_rog_example \
    inference.num_designs=10 \
    potentials.guiding_potentials=["type:binder_ROG,binderlen:100,weight:5,min_dist:15"]

This command will generate 10 designs of a 100-residue protein binder to a target protein (chain B, residues 1-100). The binder_ROG potential will be applied with a weight of 5 and a minimum distance of 15 Å between each Cα atom in the binder and the binder's center of mass. The output structures will be saved in the outputs/binder_rog_example directory.

`binder_ncontacts` / `interface_ncontacts`

./scripts/run_inference.py \
    'contigmap.contigs=[B1-100/0 100]' \
    inference.output_prefix=outputs/binder_contacts_example \
    inference.num_designs=10 \
    potentials.guiding_potentials=["type:binder_ncontacts,binderlen:100,weight:2,r_0:8,d_0:4","type:interface_ncontacts,binderlen:100,weight:1,r_0:10,d_0:6"]

This command will generate 10 designs of a 100-residue protein binder to a target protein (chain B, residues 1-100), using a combination of the binder_ncontacts and interface_ncontacts potentials. The binder_ncontacts potential will be applied with a weight of 2, an $r_{0}$ value of 8 Å, and a $d_{0}$ value of 4 Å, while the interface_ncontacts potential will be applied with a weight of 1, an $r_{0}$ value of 10 Å, and a $d_{0}$ value of 6 Å. The output structures will be saved in the outputs/binder_contacts_example directory.

These examples demonstrate how to use the binder_ROG, binder_ncontacts, and interface_ncontacts potentials for protein binder design tasks. The specific settings, such as weights and distance thresholds, can be adjusted based on the requirements of the design task and the properties of the target protein. These potentials can also be combined with other potentials, such as monomer_ROG and monomer_contacts, to simultaneously optimize the binder structure and its interaction with the target.

When designing protein binders, it's important to consider the balance between the compactness of the binder (controlled by binder_ROG) and the formation of favorable contacts within the binder and at the interface (controlled by binder_ncontacts and interface_ncontacts). Too much emphasis on compactness may prevent the formation of a complementary binding surface, while overemphasizing contacts may lead to unrealistic or unstable structures.

Iterative design and experimental validation are often necessary to find the optimal combination of potentials and settings for a specific binder design task. It's recommended to start with moderate weights and adjust them based on the results, while considering the trade-offs between different structural features and the desired properties of the binder.

Examples of Combining Multiple Potentials

Here are a few examples of how to combine multiple potentials using the RFDiffusion command line interface:

Example 1: Combining `monomer_ROG` and `monomer_contacts` for unconditional generation

./scripts/run_inference.py \
    'contigmap.contigs=[100]' \
    inference.output_prefix=outputs/monomer_combined_example \
    inference.num_designs=10 \
    potentials.guiding_potentials=["type:monomer_ROG,weight:3,min_dist:12","type:monomer_contacts,weight:1,r_0:8,d_0:4"] \
    potentials.guide_scale=5 \
    potentials.guide_decay="linear"

This command will generate 10 designs of a 100-residue monomeric protein using a combination of the monomer_ROG and monomer_contacts potentials. The monomer_ROG potential will be applied with a weight of 3 and a minimum distance of 12 Å, while the monomer_contacts potential will have a weight of 1, an $r_{0}$ value of 8 Å, and a $d_{0}$ value of 4 Å. The overall strength of the guiding potentials will be scaled by a factor of 5, and the influence of the potentials will decay linearly over the course of the design trajectory. The output structures will be saved in the outputs/monomer_combined_example directory.

Example 2: Combining `olig_contacts` and `monomer_ROG` for symmetric oligomer generation

./scripts/run_inference.py \
    --config-name symmetry \
    inference.symmetry=d2 \
    'contigmap.contigs=[120]' \
    inference.output_prefix=outputs/olig_combined_example \
    inference.num_designs=10 \
    potentials.guiding_potentials=["type:olig_contacts,weight_intra:1,weight_inter:0.5","type:monomer_ROG,weight:2,min_dist:10"] \
    potentials.olig_intra_all=True \
    potentials.olig_inter_all=True \
    potentials.guide_scale=3 \
    potentials.guide_decay="quadratic"

This command will generate 10 designs of a 120-residue D2-symmetric oligomer using a combination of the olig_contacts and monomer_ROG potentials. The olig_contacts potential will be applied with an intra-chain weight of 1 and an inter-chain weight of 0.5, with all intra-chain and inter-chain contacts set to attractive. The monomer_ROG potential will be applied to each individual chain with a weight of 2 and a minimum distance of 10 Å. The overall strength of the guiding potentials will be scaled by a factor of 3, and their influence will decay quadratically over the course of the design trajectory. The output structures will be saved in the outputs/olig_combined_example directory.

Example 3: Combining `substrate_contacts`, `monomer_ROG`, and `monomer_contacts` for motif scaffolding


./scripts/run_inference.py \
    'contigmap.contigs=[10-20/A10-30/50-60]' \
    inference.input_pdb=motifs/example_motif.pdb \
    inference.output_prefix=outputs/scaffold_combined_example \
    inference.num_designs=10 \
    potentials.guiding_potentials=["type:substrate_contacts,weight:3,s:1,r_0:10,d_0:6,rep_r_0:4,rep_s:2","type:monomer_ROG,weight:1,min_dist:15","type:monomer_contacts,weight:0.5,r_0:8,d_0:4"] \
    potentials.guide_scale=4 \
    potentials.guide_decay="cubic"

This command will generate 10 designs of a protein scaffold for the motif provided in motifs/example_motif.pdb, using a combination of the substrate_contacts, monomer_ROG, and monomer_contacts potentials. The scaffold will consist of 10-20 residues N-terminal to the motif, the motif itself (residues 10-30 of chain A), and 50-60 residues C-terminal to the motif. The substrate_contacts potential will be applied with a weight of 3 and the specified settings, the monomer_ROG potential will be applied with a weight of 1 and a minimum distance of 15 Å, and the monomer_contacts potential will be applied with a weight of 0.5, an $r_{0}$ value of 8 Å, and a $d_{0}$ value of 4 Å. The overall strength of the guiding potentials will be scaled by a factor of 4, and their influence will decay cubically over the course of the design trajectory. The output structures will be saved in the outputs/scaffold_combined_example directory.

These examples showcase how multiple potentials can be combined to guide the design process towards structures that satisfy multiple criteria simultaneously. By adjusting the weights and settings of the individual potentials, as well as the overall scaling and decay behavior, the user can fine-tune the balance between different structural features and control the evolution of the design trajectory.

When combining potentials, it's important to consider their relative strengths and how they might interact with each other. For example, using a high weight for the monomer_ROG potential might lead to overly compact structures that don't allow enough room for favorable contacts. Similarly, using a strong substrate_contacts potential without sufficient repulsive forces could result in clashes between the protein and substrate atoms.

Experimentation and iteration are often necessary to find the optimal combination of potentials and settings for a given design task. It's recommended to start with moderate weights and adjust them based on the results, while monitoring the output structures for any undesired artifacts or instabilities.

Upvote

RFDiffusion Potentials

Introduction

The Mathematics Describing the Various Potentials

Potentials for Unconditional Generation

monomer_ROG

monomer_contacts

Potentials for Symmetric Unconditional Generation

olig_contacts

Potentials for Motif Scaffolding

substrate_contacts

Example Usage:

monomer_ROG

monomer_contacts

olig_contacts

substrate_contacts

Binder Design

Potentials for Binder Design

binder_ROG

binder_ncontacts/interface_ncontacts

Example Usage:

binder_ROG

binder_ncontacts / interface_ncontacts

Examples of Combining Multiple Potentials

Example 1: Combining monomer_ROG and monomer_contacts for unconditional generation

Example 2: Combining olig_contacts and monomer_ROG for symmetric oligomer generation

Example 3: Combining substrate_contacts, monomer_ROG, and monomer_contacts for motif scaffolding

`monomer_ROG`

`monomer_contacts`

`olig_contacts`

`substrate_contacts`

`monomer_ROG`

`monomer_contacts`

`olig_contacts`

`substrate_contacts`

`binder_ROG`

`binder_ncontacts`/`interface_ncontacts`

`binder_ROG`

`binder_ncontacts` / `interface_ncontacts`

Example 1: Combining `monomer_ROG` and `monomer_contacts` for unconditional generation

Example 2: Combining `olig_contacts` and `monomer_ROG` for symmetric oligomer generation

Example 3: Combining `substrate_contacts`, `monomer_ROG`, and `monomer_contacts` for motif scaffolding