Statistical geometry algorithm implementation in Python
Tomasz Puton, Sandra Smit, Kristian Rother, Jaap Heringa & Janusz M. Bujnicki
The implementation of the statistical geometry in sequence (binary and quaternary) space algorithm written in Python.
It is mainly applied in biology and sequence analysis in the context of evolution, e.g. for evaluating evolutionary models.
The algorithm allows for checking divergence of a given sequence alignment. It allows you to check whether your sequences (RNA, DNA, protein) follow a tree-like pattern of divergence or a bundle-like pattern.This is the main capability of the library. It is important to perform the test in order to see whether a tree can be built for a set of sequences (if they follow bundle-like divergence, building a tree doesn't make sense at all).It also allows for checking how various positions in an alignment of many related sequences are randomized, and therefore concluding which are constrained in the process of evolution. This can be done by splittingsequence alignment positions into two separate sequence alignments and then measuring the divergence within each group.
The original description of the statistical geometry algorithm in sequence space can be found in the paper:
And an example analysis here:
However, also a very good starting point to understanding the algorithm is the Biophysical Chemistry paper by Kay Nieselt-Struwe reviewing the statistical geometry in sequence space and all its variants:
Download source code -