For template-based modeling, a FASTA file containing two sequences needs to be prepared. The first is the sequence of the model to be built. The second sequence is the template sequence. It needs to be identical with the sequence from the template structure. It can be retrieved by the
> target sequence GDCGGD--DAGAAUACCUGCCUQUCACGCAGGG-----CGGGTPCGAGU > template sequence GDCGGDCUDAGAAUA----CCUQUCACGCA--GG7UCGCGGGTPCGAGU
The Template Structure
Any RNA structure from the PDB may be used for modeling. The file should be prepared to contain a single model (if not, the first will be taken by default), and alternative conformations should be removed. You should know the chain identifier of the RNA chain you want to use template (A is used by default). Protein chains and non-RNA hetero groups are ignored. ModeRNA uses the remediated PDB nomenclature (e.g. single quotes (‘) instead of asterisks(*) for atom names in the ribose.
Preparation of PDB files
For each RNA template used, it is checked, whether the sequence from the structure is the same as from the corresponding alignment. To check what the exact sequence of your template is, the
The most common reason that causes problems while modeling RNA with ModeRNA is that a template is not compatible with the alignment. Possible reasons include:
1. Backbone discontinuity
A_AAAA A-AGAAModeRNA can try to fix broken backbone with ModeRNA, which helps in many cases. See the
2. Single ligands, ions or water molecules
AAAA_. AAAA--(see 3. if there are more such symbols).
3. Multiple water or ion molecules
4. Protein chain
5. Missing or broken bases.
All described above features need to be included in the alignment or the template structure need to be modified.
One may check whether a PDB file (or alignment) needs to be cleaned up.
PDB file modification can be done with the
Automatic model building
When building a model
After automatic model building, it is checked, whether the sequences in an alignment and template structure are equal. This includes checking for backbone discontinuities caused by sub-optimal loop insertion. In any case, a detailed report will be written to the
Moderna is capable of recognizing and editing 115 different base modifications, which have been taken from the Modomics database. For each modification, a name, an abbreviation, a one-letter code, and a molecular structure are given.
A complete list of modification names and abbreviations can be found in Modomics or in the file
> Target containing several modifications as one-letter abbreviations ACUGUGAYUA[U UACCU#PG > Template containing the modification 015U GCGGA----UUU015UALCUCAG
For adding modifications to standard bases, a set of 67 small fragments has been created. Each such fragment contains atoms belonging to a modification, and a triple of connecting atoms which are used to fit the fragment to an existing standard base. These connecting atoms are given in the data/modification_rules file. For removing modifications, a list of atom names which are to be removed is given in the same file.
Recognition of modified nucleotides
The recognition of modified bases is done in two steps: First, if the name of a residue is one out of A,G,C, or U, it is checked whether all and only the atoms for this nucleotide are present. Otherwise, it is assumed that this base is modified. In the second step, modified bases are detected by matching subgraphs in the molecular topology of that residue. For all modifications, such subgraphs have been defined in a SMILE-like notation in such a way, that all of them can be unambiguously distinguished.
The loop search procedure in ModeRNA tries to find the best fitting
loop from a library of 88000 RNA fragments. The search is done in two
steps: A fast one based on geometrical parameters, and a slower one
based on the exact fit of a smaller set of loop candidatas. In the
first step, the similarity of seven parameters derived from the
residues at both ends of the fragment is compared to the same
parameters of the residues in the model, between which a linker is to
be found. These parameters involve three distances, two angles, and
three dihedrals (two of which enforce a proper orientation of the
bases). Additionally, the sequence similarity of the linker and the
desired fragment is taken into account. A description of this approach
can be found in
ModeRNA provides a set of functions that can be applied to manipulate secondary structure in the model. The length of helices can be changed and a second strand can be added to a single stranded base. Entire secondary structure fragments provided by a user can be inserted to the structure and, while searching for a missing fragment, ModeRNA can take secondary structure restraints into account. Moreover, the secondary structure can be calculated from the tertiary structure of a model or template. The detailed description of all functions for secondary structure modeling can be found on the Secondary structure page.
The base pairs used by Moderna are calculated according to the Leontis/Westhoff definition . Currently, only the canonical Watson-Crick and Wobble base pairs are used. ModeRNA internally defines three edges in each base (the Watson, Hoogsteen, and Sugar edeg) similar to the RNAView program and tries to recognize the 12 Leontis/Westhof types of noncanonical base pairs. The latter part is however subject to debugging and therefore not publicly advertised (if you are still interested, look for ModernaStructure.get_base_pairs()).
Stacking of bases is calculated according to the procedure described by (Gendron P, Lemieux S, Major F. JMB 2001;308(5):919-36). It essentially defines a range for the angle between the normals of two aromatic rings, their distance, and the horizontal displacement of the rings.
This definition of base stacking is not restricted to adjacent bases, but provides stacking for any two pairs. This procedure has been implemented as part of Moderna. Moderna calculates the base stacking for the template and model structure, to detect differences between the two - in order to detect interactions that were eventually lost during the modeling.
Interatomics clashes are detected between any pair of atoms that do not belong to the same residue. A clash is defined as an overlap of the atomic radii. The atomic radius again depends on the element.
In Moderna, a set of smaller radii, and considerably bigger VdW radii have been implemented. Which of them is used, is specified in the find_clashes function. The atom radii used can be found in thefollowing table:
The bond between O3’ atom of the preceding and the P atom of the following residue is always exempt from clash recognition. Please note that the hydrogen atoms are not considered, unless they are explicitly contained in the structure.
There exist several definitions of unified atoms, where the hydrogens are represented by a higher atom radius of the attached heavy atom (e.g. the ProtOr set). These radii set conflict with the tightly packed structure of RNA, and lead to clashes even in correct structures. We therefore decided not to use them.
Once ModeRNA has produced a model, it may be necessary to refine the local geometry. Mostly, this is the case when for a loop no optimal candidate could be found. In that case, the backbone conformation will contain chemically unreasonable bond lengths, angles and dihedrals. The same may be necessary to optimize the conformation of base pairs. Both issues are addressed by refinement programs.
MMTK is a program library that performs energy minimization using Conjugate Gradient mimimization. It uses the AMBER force field. ModeRNA provides a script to perform minimization of models using MMTK. Alternatively, it is also possible to use a different Molecular Dynamics program like NAMD, or forcefield like Charmm (but for these, no such script exists). Some commercial packages like HyperChem are also capable of performing energy minimization of RNA.
refine_model_mmtk.py -m my_model.pdb -c A -r 50-55 -y 1000\ -o optimized.pdb
what means that residues in range 50-55 from chain A of my_model.pdb will be optimized in 1000 cycles (if chain id is not specified, then the script assumes that chain A should be refined) and the entire structure containig optimized fragment will be saved to optimized.pdb. It has to be remembered, that residue numbers given after '-r' must correspond to residues in a linear sequence of RNA, and not to the numbering of residues in a PDB file (because a residue id can contain letter e.g. 20A).
It is also possible to optimize an entire model with
refine_model_mmtk.py -m my_model.pdb -y 1000\ -o optimized_fragment.pdb
Important notice! In this example a whole RNA chain is refined. But if the first residue contains an OP3 atom
(what is a characteristic feature of a first residue at the 5' end), then because of the bug in MMTK,
the program returns a Python IOError:
refine_model_mmtk.py -m my_model.pdb -y 1000 -r 2-78\ -o optimized_fragment.pdbwhere 78 corresponds to the 78th residue in a linear sequence - in this case the residue at RNA 3' end.
Modified nucleotides are not part of the standard AMBER forcefield. They are therefore excluded from the refinement by the MMTK script. A set of quantum mechanical parameters for 107 modified nucleotides has been published by (Aduri R et al. J Chem Theory Comput, 2007, 3(4), 1464–1475). They can be included in the AMBER parameter files to allow modifications to be refined in the same way.