Bujnicki lab - Statistical geometry algorithm implementation in Python

SOFTWARE
- DATABASES
  REPAIRtoire
  
  REPAIRtoire is an online database for systems biology of DNA damage and repair. The database collects and organizes the following types of information: (i) DNA damage linked to environmental mutagenic and cytotoxic agents, (ii) pathways comprising individual processes and enzymatic reactions involved in the removal of damage, (iii) proteins participating in DNA repair and (iv) diseases correlated with mutations in genes encoding DNA repair proteins. The DNA repair and tolerance pathways are represented as graphs and in tabular form with descriptions of each repair step and corresponding proteins, and individual entries are cross-referenced to supporting literature and primary databases. In addition, a tool for drawing custom DNA-protein complexes is available online.
  
  MODOMICS
  
  MODOMICS is the first comprehesive database system for systems biology of RNA modification. It integrates information about the chemical structure of modified nucleosides, their localization in RNA sequences, pathways of their biosynthesis and enzymes that carry out the respective reactions (including the available protein sequence and structure data). It also provides literature information, and links to other databases. The database can be queried by the type of nucleoside (e.g. A, G, C, U, I, m1A, nm5s2U etc.), type of RNA, position of a particular nucleoside, type of reaction (e.g. methylation, thiolation, deamination, etc.), and name or sequence of an enzyme of interest. Options for data presentation include graphs of pathways involving the query nucleoside, multiple sequence alignments of RNA sequences and tabular forms with enzyme and literatiure data. The contents of MODOMICS can be accessed through the World Wide Web at https://genesilico.pl/modomics/
  
  RNA Bricks
  
  RNA Bricks is a database of RNA 3D structure motifs and their contacts, both with themselves and with proteins. The database provides structure-quality score annotations and tools for the RNA 3D structure search and comparison.
  
  RNArchitecture
  
  RNArchitecture is a database that provides a comprehensive description of relationships between known families of structured non-coding RNAs, with a focus on structural similarities. The classification is hierarchical and similar to the system used in the SCOP and CATH databases of protein structures. Its central level is Family, which builds on the Rfam catalog, and gathers closely related RNAs. Consensus structures of Families are described with a reduced secondary structure representation. Evolutionarily related Families are grouped into Superfamilies. Similar structures are further grouped into Architectures. The highest level, Class, organizes Families into very broad structural categories, such as simple or complex structured RNAs. Some groups at different levels of the hierarchy are currently labeled as “unclassified”. For each Family with an experimentally determined 3D structure(s), a representative one is provided. RNArchitecture also presents theoretical models of RNA 3D structure and is open for submission of structural models by users. Compared to other databases, RNArchitecture is unique in its focus on structure-based RNA classification, and in providing a platform for storing RNA 3D structure predictions. RNArchitecture can be accessed at https://iimcb.genesilico.pl/RNArchitecture/.
  
  *The classification is expected to evolve as new data become available and we would like to encourage all interested parties to submit their suggestions for improvement as well as 3D structure predictions*
- STAND-ALONE
  PROTMAP2D
  
  Two-dimensional maps of contacts summarize interactions between amino acids in the structure. They reveal characteristic patterns of interactions between secondary and super-secondary structures and are very attractive for visual analysis. The overlap of the residue contact maps of two structures can be easily calculated, providing a sensitive measure of protein structure similarity.
  
  ModeRNA
  
  We developed a method for 3D homology modeling of RNA structures. It requires a pairwise sequence alignment and a structural template to generate a 3D structural model of the target RNA sequence via either a fully automated or script-based approaches. ModeRNA is capable of handling 115 different nucleotide modifications and bridging gaps using fragments derived from an extensive fragment library.
  
  RNAmap2D
  
  RNAmap2D is a software tool for calculation of contact and distance maps based on user-defined criteria, and to some extent, quantitative comparison of pairs or series of contact maps and visualization of the results.
  
  FILTREST3D
  
  Filtrest3D is a program for discrimination of a large number of alternative models of protein structure or protein-ligand structure against a set of restraints derived from low-resolution experimental analyses (such as cross-linking, mutagenesis, circular dichrosm etc.) as well as from computational predictions (e.g. solvent accessibility, amino acid contact maps).
  
  PyRy3D
  
  PyRy3D is a software tool for modeling of structures for large macromolecular complexes. It uses Monte Carlo simulation to sample conformational space and to identify the best fit of complex components structures into a density map. Complex building process is based on distance restraints derived from experiments.
  
  Statistical potentialsRNA-Protein docking
  
  We developed two medium-resolution, knowledge-based potentials for scoring protein-RNA models obtained by docking: the quasi-chemical potential (QUASI-RNP) and the Decoys As the Reference State potential (DARS-RNP). Both potentials use a coarse-grained representation for both RNA and protein molecules and are capable of dealing with RNA structures with posttranscriptionally modified residues. In our tests that compared these methods to other published potentials, DARS-RNP showed the highest ability to identify native-like structures.
  
  SimRNA
  
  SimRNA - a tool for simulations of RNA conformational dynamics, including RNA 3D structure prediction
  
  SupeRNAlign
  
  SupeRNAlign is a tool for flexible superposition of RNA 3D structures. Our program implements an iterative algorithm that splits RNA structures into fragments and superimposes them using existing tools (currently R3D Align). Finally, the superimposed structures are saved to a .pdb file and a sequence alignment is generated.
  
  QRNAS
  
  QRNAS is an extension of the AMBER simulation method with additional terms associated with explicit hydrogen bonds, co-planarity base pairs, backbone regularization, and custom restraints. QRNAS is capable of handling RNA, DNA, chimeras and hybrids thereof, and enables modeling of nucleic acids containing modified residues.
  
  ClaPNAC
- SERVERS
  SimRNAweb
  
  SimRNAweb is a web server interface to the SimRNA method for RNA 3D structure modeling method
  
  MetaServer
  
  The GeneSilico MetaServer is a gateway to a number of third-party methods for protein structure prediction (identification of domains, secondary structure prediction, fold-recognition, and finally, 3D model generation). Users can submit a protein sequence (or alignment) by a single click, then analyze the summary of results generated by many methods, and finally predict the protein structure according to the "consensus" approach.
  
  FILTREST3D
  
  This is an "alpha" version of a server for discrimination of a large number of alternative models of protein structure against a set of restraints derived from low-resolution experimental analyses (such as cross-linking, mutagenesis, circular dichrosm etc.) as well as from computational predictions (e.g. solvent accessibility, amino acid contact maps).
  
  CompaRNA
  
  The CompaRNA web server provides a continuous benchmark of automated standalone and web server methods for RNA secondary structure prediction. It has been inspired by the EVA and Livebench servers for benchmarking of protein structure prediction tools, which have greatly contributed to progress in the field of structural bioinformatics. The aim of CompaRNA is to assess the state of the art in RNA structure prediction, provide a detailed picture of what is possible with the available tools, where progress is being made and what major problems remain. The CompaRNA server is a valuable resource for all researchers who focus their attention on the usage and development of RNA structure prediction methods.
  
  ModeRNA server
  
  ModeRNA server is an online tool for RNA 3D structure modeling by the comparative approach, based on a template RNA structure and a user-defined target-template sequence alignment. It offers an option to search for potential templates, given the target sequence. The server also provides tools for analyzing, editing and formatting of RNA structure files. It facilitates the use of the ModeRNA software and offers new options in comparison to the standalone program.
  
  MetalionRNA
  
  MetalionRNA is a web server for the prediction of metal ions (magnesium, sodium, and potassium) in RNA 3D structures, based on a statistical potential inferred from the analysis of binding sites observed in experimentally solved RNA structures. The server is also capable of predicting Mg2+-binding sites for DNA structures.
  
  MetaLocGramN
  
  The MetaGramLocN is a method for subcellular localization prediction of Gram-negative proteins. The MetaGramLocN is a gateway to a number of primary prediction methods (various types: signal peptide, beta-barrel, transmembrane helices and subcellular localization predictors). The MetaGramLocN integrates the primary methods and based on their outputs provides overall consenus prediction. To make a prediction for your protein sequence use Submit or SOAP client In our benchmark, the MetaLocGramN performed better in comparison to other SCL predictive methods, since the average Matthews correlation coefficient reached 0.806 that enhanced the predictive capability by 12% (compared to PSORTb3).
  
  RIBER/DIBER
  
  Co-crystallization experiments of proteins with nucleic acids do not guarantee that both components are present in the crystal. While working as a PhD student with Matthias Bochtler, Grzegorz Chojnowski developed DIBER - a method with which to predict crystal content (DNA? protein? or both?) from the diffraction data. Now, we have together developed RIBER, which should be used when protein and RNA are in the crystallization drop. The combined RIBER/DIBER suite builds on machine learning techniques to make reliable, quantitative predictions of crystal content for non-expert users and high throughput crystallography. RIBER/DIBER requires diffraction data to at least 3.0 Å resolution in MTZ or CIF format.
  
  MinkoFit3D
  
  MinkoFit3D is a method for fitting macromolecular assemblies or their components into electron density maps. Our approach is based on finding “tight passages” inferred from the Minkowski sum boundary of two polyhedral surfaces of the structure and its map. Following the initial fit, either a robust brute-force or a genetic algorithm is used to build an initial assembly, and multi-body refinement is applied in direct electron density space.
  
  Metadisorder
  
  The GeneSilico MetaServer is a gateway to a number of third-party methods for protein structure prediction (identification of domains, secondary structure prediction, fold-recognition, and finally, 3D model generation). Users can submit a protein sequence (or alignment) by a single click, then analyze the summary of results generated by many methods, and finally predict the protein structure according to the "consensus" approach.
  
  QA-RecombineIT
  
  QA-RecombineIt provides a web interface to assess the quality of protein 3D structure models and to improve the accuracy of models by merging fragments of multiple input models. In the first stage (QA-mode), our server predicts the global quality of input models and provides estimates of local quality as the deviation between C-α atoms in the models and corresponding atoms in the unknown native structure. Together with the input models, these predictions subsequently become the input for the second stage (RecombineIt-mode), in which fragments predicted to be better than others are judiciously combined to generate hybrid (consensus) models. Finally, hybrid models are scored by the MQAPs implemented in the QA-mode and then presented to the user
  
  NPDock
  
  NPDock is a web server for modeling of protein-RNA and protein-DNA complex structures. It combines GRAMM for global macromolecular docking, scoring with a statistical potential, clustering of best-scored structures, and local refinement.
  
  ClaRNA
  
  ClaRNA is a new method for computational classification of contacts in RNA 3D structures. Unique features of the program are the ability to identify imperfect contacts and to process coarse-grained models. Each doublet of spatially close ribonucleotide residues in a query structure is compared to clusters of reference doublets obtained by analysis of a large number of experimentally determined RNA structures, and assigned a score that describes its similarity to one or more known types of contacts, including pairing, stacking, base–phosphate and base–ribose interactions.
  
  BrickworX
  
  BrickworX web server builds RNA and DNA models into electron density maps at resolutions as low as 4 Å. On input the program requires a MTZ file with structure factor amplitudes and phases.
  
  GDFuzz3D
  
  GDFuzz3D is a method for protein 3D structure modeling, which starts from the protein sequence and a map of contacts between amino acid residues. It can handle predicted maps that contain e.g., probabilities of contacts between all residues, and a high fraction of incorrectly predicted contacts. It is not appropriate for predicting structures based on maps with only a few predicted contacts.
  
  tRNAmodpred
  
  The program takes as an input a set of unmodified tRNA sequences and a set of protein sequences corresponding to a proteome of a cell, identifies all RNA residues that correspond to known modification sites with known enzymes, finds homologs of known tRNA modification enzymes, and maps the predictions onto known pathways of RNA modification, to identify theoretically possible modification reactions for all positions in query tRNAs.
  
  SupeRNAlign
  
  SupeRNAlign is a tool for flexible superposition of RNA 3D structures. Our program implements an iterative algorithm that splits RNA structures into fragments and superimposes them using existing tools (currently R3D Align). Finally, the superimposed structures are saved to a .pdb file and a sequence alignment is generated.
- TOOLKIT
  TOOLKITHOME
  
  MODELLER
  
  MetaMQAP
  
  REFINER
Home
Funded grants
- FNP (TEAM)
  Modeling of dynamic interactions between RNA and small molecules and its practical applications (POIR.04.04.00-00-3CF0/16-00); 3 449 541 PLN; 2017-2020. PI: J.M.Bujnicki, vice-PI: F.Stefaniak
- NCN (MAESTRO)
  Integrative modeling and structure determination of macromolecular complexes comprising RNA and proteins (2017/26/A/NZ1/01083); 3 500 000 PLN; 2018-2023. PI: J.M.Bujnicki, vice-PI: N.Chandran
- NCN (MINIATURA)
  Photoswitchable ligands for riboswitches (2018/02/X/NZ1/01468); 21670 PLN 2018-2019. PI: F.Stefaniak
- NCN (OPUS)
  Development of new methods for designing RNA molecules that fold into desired spatial structures and their use for development of new functional RNAs and for prediction of noncoding RNAs in transcriptome sequences (2017/25/B/NZ2/01294); 1 494 250 PLN; 2018-2021. PI: J.M.Bujnicki, vice-PI: T.Wirecki
Achievements
- Highlights
- Awards
Publications
In the media
People
Employment

RNA has recently emerged as an attractive target for new drug development. Our team is developing new methods to study the interactions between RNA and ligands. Recently, we have developed a new machine learning method called AnnapuRNA to predict how small chemical molecules interact with structured RNA molecules. Research published in PLoS Comput Biol. 2021 Feb 1;17(2):e1008309. doi: 10.1371/journal.pcbi.1008309. Read More

1

About Laboratory Of Bioinformatics And Protein Engineering

Our group is involved in theoretical and experimental research on nucleic acids and proteins. The current focus is on RNA sequence-structure-function relationships (in particular 3D modeling), RNA-protein complexes, and enzymes acting on RNA.

We study the rules that govern the sequence-structure-function relationships in proteins and nucleic acids and use the acquired knowledge to predict structures and functions for uncharacterized gene products, to alter the known structures and functions of proteins and RNAs and to engineer molecules with new properties.

Our key strength is in the integration of various types of theoretical and experimental analyses. We develop and use computer programs for modeling of protein three-dimensional structures based on heterogenous, low-resolution, noisy and ambivalent experimental data. We are also involved in genome-scale phylogenetic analyses, with the focus on identification of proteins that belong to particular families. Subsequently, we characterize experimentally the function of the most interesting new genes/proteins identified by bioinformatics. We also use theoretical predictions to guide protein engineering, using rational and random approaches. Our ultimate goal is to identify complete sets of enzymes involved in particular metabolic pathways (e.g. RNA modification, DNA repair) and to design proteins with new properties, in particular enzymes with new useful functions, which have not been observed in the nature.

We are well-equipped with respect to both theoretical and experimental analyses. Our lab offers excellent environment for training of young researchers in both bioinformatics and molecular biology/biochemistry of protein-nucleic acid interactions.

More Good Science

Commonly used

GeneSilico METASERVER

Details: Category: Uncategorised; Published: Thursday, 12 December 2013 12:13; Written by Super User; Hits: 33498

Statistical geometry algorithm implementation in Python

Tomasz Puton, Sandra Smit, Kristian Rother, Jaap Heringa & Janusz M. Bujnicki

The implementation of the statistical geometry in sequence (binary and quaternary) space algorithm written in Python.

It is mainly applied in biology and sequence analysis in the context of evolution, e.g. for evaluating evolutionary models.

The algorithm allows for checking divergence of a given sequence alignment. It allows you to check whether your sequences (RNA, DNA, protein) follow a tree-like pattern of divergence or a bundle-like pattern.This is the main capability of the library. It is important to perform the test in order to see whether a tree can be built for a set of sequences (if they follow bundle-like divergence, building a tree doesn't make sense at all).It also allows for checking how various positions in an alignment of many related sequences are randomized, and therefore concluding which are constrained in the process of evolution. This can be done by splittingsequence alignment positions into two separate sequence alignments and then measuring the divergence within each group.

The original description of the statistical geometry algorithm in sequence space can be found in the paper:
http://www.ncbi.nlm.nih.gov/pubmed/3413065

And an example analysis here:
http://www.ncbi.nlm.nih.gov/pubmed/2497522

However, also a very good starting point to understanding the algorithm is the Biophysical Chemistry paper by Kay Nieselt-Struwe reviewing the statistical geometry in sequence space and all its variants:
http://www.ncbi.nlm.nih.gov/pubmed/9362556

Download
Download source code - stat_geo_1.0.zip