GDFuzz3D: protein 2D map to 3D structure retrieval service     

Computation time

Server calculation time is approximately function of the square of the amino acid sequence length. A typical calculation time for a sequence not longer than 100 aa is up to 2 hours. If the sequence length is below 200 amino acids, then the calculation time may rise up to 8 hours. For input sequences longer than 300 aa, please expect the results to take more than 24 hours to compute.

CASP RR format

This is a tabular, PDB-like format introduced in the CASP RR contest. A detailed description and an example file can be found on the CASP format webpage, an example of a complete contact map file is also to be found here whilst the detailed file format specification is to be found in the Appendix A of this pdf document. The main section of the file consists of contact records formed as follows:

  i  j  d1  d2  p

where in case of contact maps with 8 Angstrom cutoff, d1 and d2 values are constant and equal to 0 and 8 respectively (they are ignored in calculations), i and j are residue identifiers (i<j) and p is the contact probability (a real value between 0 and 1).

An additional mandatory record is amino acid sequence. PROTMAP2D program uses SEQRES records for this purpose. Each row containing the amino acid sequence (single-letter code) should start with SEQRES tag, followed by two white space characters followed by up to 50 amino acid letters in the given line. If the sequence length is above 50 aa, additional subsequent SEQRES records should be used to provide the complete sequence of a protein. Also, it is possible to input a contact map into GDFuzz3D containing raw sequence lines, without the SEQRES tag. In either case, the sequence data in the contact map text should be placed before the contact data. Only 20 standard amino acid letters are allowed in GDFuzz3D, if the user has modified amino acids in the sequence, they should be changed into standard ones as GDFuzz3D can perform secondary structure prediction on standard amino acid letter-coded sequence only.

The format requires additional mandatory tags as MODEL and END denoting contact section start and end respectively. Other auxiliary records such as PFRMAT RR, REMARK, METHOD, AUTHOR can be left unspecified if necessary. Please note that the PROTMAP2D program requires that the MODEL tag be placed after the sequence record. Also, neglecting the amino acid sequence wrapped in the SEQRES tag disables the final fuzzy map from eventual realtime manipulation in PROTMAP2D, only allowing for saving it to any output format.

Several methods output a contact map as a matrix-like unspecified format. In such case, the user can

Alternatively, You can use the Transform matrix tool, being a part of this service, to put the matrix-like data in usable format.

Gap handling

Please be careful while using third party protein contact map prediction programs. As GDFuzz3D is aimed at retrieving 3D models of single-domain protein structures, it is at least strongly suggested not to

Useful links

- General information about contact maps  (article from wikipedia) link to wikipedia article
- CASP home page CASP homepage
- PROTMAP2D - tool for calculation, visualization and comparison of contact maps PROTMAP2D homepage
- PyMOL - a molecular visualization system (useful for 3D model visualisation and comparison) PyMOL homepage
- IPC - protein Isoelectric Point Calculator IPC

Some contact map predictors which can be used to generate input for GDFuzz3D:
- MULTICOM (use SVMcom or DNcon)
- CMAPpro
- SVMcon

Some contact map predictors which output matrix-like 2D map (see: CASP RR section):
- NNcon
- XX-Stout

Other similar 2D → 3D retrieval method: available as a server

Publicly available third-party programs used in GDFuzz3D:
- Modeller - program for comparative protein structure modelling Modeller homepage
- Orange (MDS) - data mining toolbox in Python Orange.MDS
- SSPro4 - protein secondary structure prediction SSPro4
- MMTSB - Tool Set (incl. rebuild) MMTSB

Contact: Michal J. Pietal, Lukasz P. Kozlowski, Janusz M. Bujnicki,
Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw