Tree TromboneOfMassDestruction

Navigation ...

lh_home.png
lh_fli.png

Scientific Writing ...

lh_AA.png
lh_AC.png
lh_BE.png
lh_CS.png
lh_DA.png
lh_ED.png
lh_GE.png
lh_GP.png
lh_LW.png
lh_MA.png
lh_ME.png
lh_PR.png
lh_RE.png
lh_SA.png
lh_SF.png

Other Writing ...

lh_AR.png
lh_ES.png
lh_MU.png
lh_SP.png

Reference details

Author(s) Year Title Reference View/Download

Greg Warr , Les Hatton

2022f

Reproducibility script for JSysBiology submission on protein multiplicity

WarrHatton_JSysBioAug2022.zip

Synopsis and invited feedback

This work was or is being reviewed by domain-specific experts appointed independently.

If you would like to provide feedback just e-mail me here.

Synopsis Invited Feedback Importance (/10, author rated :-) )
If we consider proteins as a single evolving system then knowledge of the emergent, global properties of this system is essential to understanding their evolution. Here we focus on the global property of multiplicity, defined as the number of species (or their equivalent) in which a given protein occurs identically. Conservation of Hartley-Shannon Information (CoHSI) is a probabilistic theory of discrete systems based in information theory and statistical mechanics (and thus mechanism-independent) that makes the following prediction. Proteins of identical length and identical sequence of amino acids that are shared between species will show a Zipfian power-law distribution of multiplicity. This prediction was tested by interrogation of the full UniProtKB/TrEMBL protein sequence database (219,174,961 entries in release 2021_03), which was found to contain over 13 million such sequences (over 6% of the database) whose multiplicities ranged from 2 to approximately 10,600 species or equivalent, distributed across the 3 domains of life as well as the viruses.  The multiplicities of these proteins show a distribution of remarkable mathematical precision; when the number of proteins with particular multiplicities was plotted in rank order an extremely precise and statistically highly robust Zipfian power-law was seen, satisfying criteria of both necessity and sufficiency. The power-law spans over 5,000-fold in multiplicity and over one million-fold in the number of different sequences of a given multiplicity. The viral sequences contribute to the precision of the power law distribution even though they represent fewer than 3% of the identified proteins. Considered separately the protein multiplicities of each of the 3 domains of life and the viruses also show statistically-robust power-laws. The high precision and vast provenance of these results essentially rule out explanations based in coincidence or particular mechanisms, and we propose that purely probabilistic explanations that are independent of mechanism can be considered for the emergence of global properties in evolving systems.None yet9

Related links

Related papers and links
Sorry, no links registered in database yet.

Auto-generated: $Revision: 1.64 $, $Date: 2022/05/20 08:41:34 $, Copyright Les Hatton 2001-