SimRNAweb


SimRNA

Documentation


Things to prepare before using SimRNAweb

It is best to gather as much information as possible before using SimRNAweb to do a simulation. SimRNAweb can do de novo folding and can often solve a short hairpin sequence quite successfully. However, most of the interesting RNA structures are complex. Secondary structure information alone may not be adequate to ensure that a correct prediction is obtained. Therefore, it may be necessary to obtain information on tertiary structure.

The more information is available, the more reliable the prediction will be. Nevertheless, when nothing is known, then the only option is to fold the sequence de novo. It should be recognized that the reliability of such predictions depends strongly on how much information can be assembled beforehand.

Example input

In Figures 1a-g, we show various examples of using restraints with sequences and PDB files under various circumstances for PDB id 1L2X.

(1. Sequence alone) When only the sequence is submitted to SimRNAweb, the RNA sequence should be pasted into Box (3):

GGCGCGGCACCGUCCGCGGAACAAACGG

Figure 1a. Submitting sequence alone to SimRNAweb.


(2. Sequence and secondary structure) When the sequence is submitted along with the structure of a pseudoknot (perhaps obtained from RFAM or from SHAPE data), Boxes (3) and (4) are used. The sequence above is submitted in Box (3), as above

GGCGCGGCACCGUCCGCGGAACAAACGG
and the pseudoknot structure is introduced by submitting two lines of secondary structure corresponding to the two overlapping helices in the secondary structure field (Box (4)),

..(((((......)))))..........
.........(((.............)))

Note that pseudoknot restraints are expressed by including an additional line with the entangled helix on a second line. This can be done up to as many lines as needed to specify any very deeply entangled pseudoknots. This notation allows clear specification for more than one pseudoknot, as opposed to the usual notation "..(((((..[[[.))))).......]]]".


Figure 1b. Submitting sequence along with secondary structure restraints to SimRNAweb.

(3. Refining a 3D structure) When starting with a structure from a PDB file (e.g., 1l2x_rna_clust0.pdb), one should upload the PDB file in Box (5) of the submission form. In principle, the secondary structure restraints listed in Box (4) are not absolutely required. However, in general, unless one wishes to continue (or restart) a simulation from some intermediate point without restraints, there is little reason to submit a correct PDB structure and do a simulation on it. Maybe there is a situation where this might be necessary, but in general, the purpose would (probably) be refinement.

The secondary structure information is entered in Box (4) as before,

..(((((......)))))..........
.........(((.............)))

and the input PDB structure file: 1l2x_rna_clust0.pdb is entered in Box (5).


Figure 1c. Submitting a PDB file along with secondary structure restraints to SimRNAweb.

(4. Including all types of restraints with a PDB file) It is also possible to submit a PDB file (Box (5)) with secondary structure restraints (Box (4)),

..(((((......)))))..........
.........(((.............)))

include the PDB structure file (1l2x_rna_clust0.pdb) for refinement in Box (5), specify frozen residues in the PDB file in Box (6),

A:3-7, 14-18

and upload any additional distance restraints into Box (7) using a text file with the appropriate syntax (e.g., 1l2x_restraints.txt)

SLOPE   A/14/C4' A/7/P    3.0  4.0    1.0
WELL    A/14/C4' A/7/P    3.0  4.0    1.0

NOTE: Please see the last Section of this help document to find more information on formatting and combining distance restraints (including some applied examples).


Figure 1d. Submitting a PDB file to SimRNAweb along with secondary structure restraints, distance restraints, and frozen parts of the structure (based on the PDB file).

(5. Including all types of restraints with sequence files) Similar to example (4), when submitting sequence alone with all types of restraints, first paste the RNA sequence

  GGCGCGGCACCGUCCGCGGAACAAACGG
into Box (3), paste the secondary structure restraints (with pseudoknot) into Box (4)

..(((((......)))))..........
.........(((.............)))

and, as with the PDB file, the distance restraints should be submitted using the text file (1l2x_restraints.txt), where, in this example, the restraints are


SLOPE   A/14/C4' A/7/P    3.0  4.0    1.0
WELL    A/14/C4' A/7/P    3.0  4.0    1.0

in Box (7).


Figure 1e. submitting a sequence to SimRNAweb along with secondary structure restraints and distance restraints.

(6. When several missing residues must be inserted) ADVANCED!

SimRNAweb can be used to add missing sequence in such structures as RNase P (PDB id: 3DHS), Figure 2ab. Here, the PDB file presents a molecule that has incomplete sequence due to unresolved fragments.

The PDB file 3DHS presents a single chain (A) with atomic coordinates for residues in the range 1-81, 250-263, 271-351, and 376-414. The PDB file indicates that the following residues are missing: 231-249, 264-270, 352-375, and 415-417, and the missing residue names are indicated. It also indicates missing P atoms for residues 250, 271, 376.

First, a full length sequence of the molecule to be modeled must be reconstructed. This task can be achieved e.g. with a text editor by combining the sequence extracted from the coordinates, with the missing parts inserted or appended terminally in appropriate places. This sequence should be split into two chains corresponding effectively to residue ranges 1-81 and 231-417. As a result, all residues numbered 231-417, originally in chain A, shall become a new chain (B), and be renumbered 1-187 (original number minus 230), as sequence input requires the residue numbering to restart from 1 for every chain.

Second, the PDB file should be prepared. This task can be achieved e.g. with a macromolecular structure viewer/editor such as PyMOL, UCSF Chimera, Swiss PDB Viewer, Rasmol etc. Residues with missing atoms (250, 271, 376) should be removed from the PDB file, and all remaining residues in the range 231-417 (251-263, 272-351, and 377-414) should be renumbered by subtracting 230 (to number them as 21-33, 42-121, and 147-184, respectively) and their chain identifier should be modified from A to B. Only then, the sequence in the PDB file will correspond in numbering to that in the sequence input file.

The modeling of missing sequences with SimRNAweb should be executed with the residues from the PDB input frozen, and the remaining residues (added from sequence input) free to fold.


Figure 2a. Original PDB structure (PDB id: 3DHS) with the missing sequence. Red residues indicate points where a part of the sequence is missing.

Figure 2b. Example of the result of inserting the missing sequence into the original PDB structure (PDB id: 3DHS).

Detailed description of inputs

1. Format of sequence input files

The input data is written in a single line in a basic text file with the sequence of RNA (both upper and lower cases are acceptable). For example:

AGACUGCUGAGAGACC

There should be nothing but the desired sequence or sequences contained in the file because the program will read everything in the file as though it were part of a sequence. There should be no additional spaces either.

To have more than one RNA chain as input an input, the user must separate the different chains by white spaces. For example:

AAGCUA AAAGCUGGGCU

2. Format of secondary structure restraint files

If the user wants to provide secondary structure restraints, the contents of the restraint file should be in the following one-line format:

.((((......)))).

Or, when pseudoknots are involved, several lines can be provided depending on the characteristics of the pseudoknot. For example, if there is only one pseudoknot, then the following is sufficient:

.((((......)))).
......(........)

If more than one pseudoknot is involved, each one should be written on a separate line. For example, the following structure would involve two pseudoknots:

((((.......))))...............((((.........))))).
......((((.............))))......................
..................((((..............)))).........
Comments:

3. Format of structure input files

The user requests starting the calculation with a structure provided by the user in PDB format.

Essentially, a PDB file is cleaned (water, ions, non-standard residues are removed) with `rna-pdb-tools` to get a file "SimRNA ready". However, these tools simply follow orders (all muscle and no brain). Therefore, it is wise when the user prepares the PDB files to avoid any inadvertent misinterpretation of desired structural information on the part of the automated tools.

Hence, although a fair amount of flexibility is built into SimRNAweb, there are some important caveats to keep in mind.

4. Restraint command formats

In SimRNA, distance restraints can be defined for any pair of atoms used in SimRNA representation, as well as to the central point of the nucleic acid base: for pyrimidines P, C4′, N (N1, N9), C2, C4 and for purines P, C4′, N9, C2, C4 and C6 (for purines) and the the middle of the base (label MB).

The restraints come in two basic forms: WELL and SLOPE. A typical WELL restraint effectively "captures" the two beads when they come within a prescribed distance from each other as in Figure 4a, where the penalty is zero when the distance between the orange and red bead is outside of the well, and some negative number when inside the well. This means nothing happens until it finds the hole. A typical SLOPE restraint penalizes any region outside the zone where the penalty is zero as in Figure 4b.

Figure 4. A cartoon describing the way restraints affect the interaction between two beads: here orange is the reference bead and the red bead is shown moving relative to the orange bead. a) A WELL restraint where outside the hole, the red bead moves freely and inside the hole, it is trapped to a specific range. b) A SLOPE restraint where the penalty is zero when the distance between the beads is inside the zone, and outside this, the beads are either pulled together or pushed away.

The WELL restraint is expressed by a single line in the restraints file as follows:

WELL atom_1_id atom_2_id min_dist max_dist weight

The SLOPE restraint is expressed similarly as follows:

SLOPE atom_1_id atom_2_id min_dist max_dist weight 

or (alternative for SLOPE):

DISTANCE atom_1_id atom_2_id min_dist max_dist weight  

Example line in a restraints file:

SLOPE A/23/C4' C/45/P 5.5 8.5 1.0
WELL A/23/C4' C/45/P 6.5 7.5 1.0  

where

A/23/C4' means atom C4' of nucleotide 23 in chain A
C/45/P means atom P in nucleotide 45 in chain C

5.5 [Å]: minimal distance where the weight is zero (for SLOPE)
6.5 [Å]: minimal distance where the weight is -1 (for WELL)
7.5 [Å]: maximal distance where the weight is -1 (for WELL)
8.5 [Å]: maximal distance where the weight is zero (for SLOPE)
1.0 weight of this restraint weight for both SLOPE and WELL. For WELL, the value is -1 between 6.5 and 7.5 [Å], for SLOPE the value increases for distances greater than 8.5 or less than 5.5 [Å].

This is an example of a multifunctional restraint that resembles Figure 5e.

Comments:

Restraint options

Figure 5. Examples of types of constraints. a) An elementary function of type SLOPE that depends on the following three parameters: the minimum distance, the maximum distance and the slope. b) An elementary function of type WELL that depends on three parameters: the minimum and maximum distance and the depth of the well. c) A single SLOPE function with negative weight. d) A single WELL function with negative weight. e) A combination one SLOPE function and one WELL function. f) A combination of three SLOPE and two WELL functions, both one WELL function and one SLOPE function with negative weights.


A restraint can be thought of as a flexible tether that drives the selected atoms towards a certain distance by applying a penalty for distances that deviate from that range. It can also provide a reward when a desired distance is achieved. The penalty and reward are positive and negative contributions to the total energy of the simulated system, respectively.

There are two types of distance restraints: WELL and SLOPE (the keyword DISTANCE can also be used in the place of SLOPE). Both restraints describe an inner zone between the specified minimum number and maximum number (which can be the same number). For SLOPE, the penalty within the inner zone is zero, and all regions outside change linearly with distance by the weight outside this zone. When the weight is positive, the SLOPE forms a "V" shape or a "\_/" shape depending on the size of the inner zone (Figure 5a). If the weight is negative, the shape will be inverted, but the penalty in the inner zone is always zero (for SLOPE). In the case of WELL, a positive weight produces a divot "|_|" where outside the zone, the value is zero and inside the well, the energy is negative (Figure 5a). Changing the sign of the weight for WELL means we create a wall or a stumbling block (a positive penalty) in the inner zone.

The basic forms for SLOPE and WELL, where the weight is positive, are shown in Figure 5a and b.

In the case of a SLOPE-type restraint, the two atoms are tethered towards the region by applying a linear penalty that corresponds to the degree of violation of the distance from the desired region. When the distance between the atoms becomes equal to the desired value, the value of the function reaches zero. The shape of the function resembles a V, with a bottom that can correspond to a single point or to a “flat” region (Figure 5a).

In case of a WELL-type restraint, the function is flat and equals zero for any distance outside the desired range, while the distances within the desired range correspond to the negative value of the weight (Figure 5b).

Both the SLOPE and WELL restraint functions can be “reversed” when a negative weight is applied (c.f., Figures 5c and d). For example, applying a SLOPE-type function with a negative weight can be used as a repelling function (Figure 5c). This function can be useful in simulations in which the user desires to study molecule stretching between terminal residues, for example. On the other hand, the WELL-type function can be applied with a negative weight when the user wishes to define a distance range that the atoms should avoid. When atoms are in the distance range specified by WELL, an additional penalty is applied (Figure 5d).

Any number of these two types of functions can be also combined in order to define complex restraints. The resulting function can adopt various shapes (Figures 5e and f). Therefore, the relative distance of two atoms under consideration can be described by a dedicated function or a linear combination of functions used as a part of the total scoring of the energy.

For both types of restraints, the user must specify the restraints in subsequent lines of the restraint file in the following format

ChX/i/atom_i ChY/j/atom_jmin_distmax_distweight

where examples are provided in the next section. Here, ChX and ChY specify the chain index (chain name can be the same or different), i and j refer to the index of the particular residue on the respective chain, and atom_i and atom_j refer to the atom on which the constraint is applied. Restraints of type WELL are 0 except for the range between min_dist and max_dist. For the region between min/max (min_distmax_dist), the value becomes -1*weight.

Restraints of type SLOPE are 0 within the range, min_distmax_dist. Outside that range, a linear increasing positive penalty is added (the pair of atoms are attracted to each other because there is less penalty as they approach the region between min_dist and max_dist). The value for the penalty is dist_violation*weight.

Restraints for a given pair of atoms can be combined (added). It requires two (or more) lines to specify subsequent contributions.

For an applied example of using the above described restraints, Figure 6 shows two examples where the secondary structure restraints and distance restraints (WELL and SLOPE) were used to improve the fit of the 3D structure for two tRNA structures. The necessary restraint files and sequence for folding 3L0U are also provided as an example on the submission page of SimRNAweb, where the best structure using this combination of restraints was 6.1 Angstroms RMSD (Figure 6a). Figure 6b provides the requisite files for PDB structure 1EVV, where SimRNAweb produced one cluster with 6.5 Angstroms RMSD.


Figure 6a. Results of folding the sequence for PDB id 3L0U with secondary structure restraints and distance restraints using SLOPE and WELL. The native structure is shown in green. The blue region shows where the distance restraints were applied. The tRNA structure has modified bases in this region of the structure.

Figure 6b. Results of folding the sequence for PDB id 1EVV with secondary structure restraints and distance restraints using SLOPE and WELL. The native structure is shown in green. Here also, the tRNA structure has modified bases in this region of the structure.

Submit options

Job title – letters, digits, and +_-.(), 20 first characters of a title will make a job id so pick something short, sweet, and to the point.

E-mail address – the email address will be used to send results, but is not required of the user. However, for long sequences, the simulation can take many hours, so if the page is lost, there is no way to find it again.

RNA sequence – see above

Secondary structure – see above


Workflow of modeling your sequence using a fragment of the tetraloop motif

This is a more advanced demo on how SimRANweb can be used to give you a feel of the usability of the server.

Prepare a PDB file used as a fragment to model a tetraloop motif of your sequence

To model your RNA sequence of interest (CGCGCAAGCG) using a tetraloop PDB motif taken from the PDB database, first we have to prepare the PDB file.
   XXXX    <- use the tetraloop from 1zih to model your RNA
1234567890 # numbering of your RNA sequence
CGCGCAAGCG # your RNA sequence
(((....))) # secondary structure of your RNA sequence
Start PyMOL (https://www.pymol.org/) and fetch a structure 1zih (fetch 1zih).

Remember, it is absolutely essential to examine the PDB file (e.g. with PyMOL), check for missing parts, gaps (or unusual numbering), ions, water, or anything else that looks suspicious.

Note that this PDB structure has hydrogen atoms, please remove them (Action -> hydrogen -> remove). Select the tetraloop ('GCAA') and extract it to a new object and rename this object to set_name 1zih, tetraloop. Since the number of the tetraloop is slightly different than in the RNA sequence, the numbering must be shifted by 1 (the 5th residue becomes the 4th etc) using alter (sele), resv-=1. Note that this PDB file has 10 states, we need only one to use with SimRNAweb. The easiest way to get rid of additional states is to save the molecule to a new PDB file. Now you can use this file to model your sequence.

SimRNAweb

Now you are ready to run SimRNAweb. Type a sequence, secondary structure and residues to freeze (i.e., residues that should not move during the simulation).
seq: CGCGCAAGCG
secondary structure: (((....)))
freeze: A:4-7
upload the file tetraloop.pdb and click Submit!

The result is in http://iimcb.genesilico.pl/SimRNAweb/jobs/4371aff4-cea1-4deb-88ce-fbb9dc6ae5a4.

Happy prediction!