GDFuzz3D help: protein 2D map to 3D structure retrieval service

Computation time

Server calculation time is approximately function of the square of the amino acid sequence length. A typical calculation time for a sequence not longer than 100 aa is up to 2 hours. If the sequence length is below 200 amino acids, then the calculation time may rise up to 8 hours. For input sequences longer than 300 aa, please expect the results to take more than 24 hours to compute.

CASP RR format

This is a tabular, PDB-like format introduced in the CASP RR contest. A detailed description and an example file can be found on the CASP format webpage, an example of a complete contact map file is also to be found here whilst the detailed file format specification is to be found in the Appendix A of this pdf document. The main section of the file consists of contact records formed as follows:

i j d1 d2 p

where in case of contact maps with 8 Angstrom cutoff, d1 and d2 values are constant and equal to 0 and 8 respectively (they are ignored in calculations), i and j are residue identifiers (i<j) and p is the contact probability (a real value between 0 and 1).

An additional mandatory record is amino acid sequence. PROTMAP2D program uses SEQRES records for this purpose. Each row containing the amino acid sequence (single-letter code) should start with SEQRES tag, followed by two white space characters followed by up to 50 amino acid letters in the given line. If the sequence length is above 50 aa, additional subsequent SEQRES records should be used to provide the complete sequence of a protein. Also, it is possible to input a contact map into GDFuzz3D containing raw sequence lines, without the SEQRES tag. In either case, the sequence data in the contact map text should be placed before the contact data. Only 20 standard amino acid letters are allowed in GDFuzz3D, if the user has modified amino acids in the sequence, they should be changed into standard ones as GDFuzz3D can perform secondary structure prediction on standard amino acid letter-coded sequence only.

The format requires additional mandatory tags as MODEL and END denoting contact section start and end respectively. Other auxiliary records such as PFRMAT RR, REMARK, METHOD, AUTHOR can be left unspecified if necessary. Please note that the PROTMAP2D program requires that the MODEL tag be placed after the sequence record. Also, neglecting the amino acid sequence wrapped in the SEQRES tag disables the final fuzzy map from eventual realtime manipulation in PROTMAP2D, only allowing for saving it to any output format.

Several methods output a contact map as a matrix-like unspecified format. In such case, the user can

transform such map to a Clans format
then use the PROTMAP2D program to encode CASP format that is needed as an input for GDFuzz3D

Alternatively, You can use the Transform matrix tool, being a part of this service, to put the matrix-like data in usable format.

Gap handling

Please be careful while using third party protein contact map prediction programs. As GDFuzz3D is aimed at retrieving 3D models of single-domain protein structures, it is at least strongly suggested not to

feed any 3rd party contact map prediction program with multiple domain protein, before using GDFuzz3D
perform any preprocessing of the predicted map and the sequence (i.e. indel introduction)
feed any 3rd party 2D map prediction program with gapped (structure-based) sequence if a PDB template is known (especially off the N-C termini)

Useful links

Generic:
- General information about contact maps (article from wikipedia)

- CASP home page

- PROTMAP2D - tool for calculation, visualization and comparison of contact maps

- PyMOL - a molecular visualization system (useful for 3D model visualisation and comparison)

- IPC - protein Isoelectric Point Calculator

Some contact map predictors which can be used to generate input for GDFuzz3D:
- MULTICOM (use SVMcom or DNcon)
- CMAPpro
- SVMcon

Some contact map predictors which output matrix-like 2D map (see: CASP RR section):
- NNcon
- XX-Stout
- SPAM

Other similar 2D → 3D retrieval method: available as a server
- FT-COMAR

Publicly available third-party programs used in GDFuzz3D:
- Modeller - program for comparative protein structure modelling

- Orange (MDS) - data mining toolbox in Python

- SSPro4 - protein secondary structure prediction

- MMTSB - Tool Set (incl. rebuild)

Contact: Michal J. Pietal, Lukasz P. Kozlowski, Janusz M. Bujnicki,
Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw

GDFuzz3D: protein 2D map to 3D structure retrieval service

Computation time

CASP RR format

Gap handling

Useful links