06-Sep-2017 v. 1.35
Version accepted by arXiv. It now contains a solution of the full problem without recourse to simple bounds.
Updates the paper to include a discussion of Tsallis entropy and where our work departs from an excellent paper by Frank (2009) on the Common Patterns of Nature.
This is a detailed description (70 pages, 54 figures, 70 references, 5 appendices, 14 R analyses and a lot of equations) of a theory of ergodic information conservation which by bounding the Hartley-Shannon information above and below, is able to predict the Canonical length distribution and alphabet distribution of a wide class of discrete systems at all scales with no additional assumptions.
This is applied to both Proteins and Computer Programs where it predicts with very strong statistical support that their length distribution is sharply unimodal, transitioning into an extraordinarily accurate power-law. In Music, it shows the same distribution and also predicts again with very strong statistical support that however we categorise such systems (for example categorising notes with and without duration), the categories are also power-laws of each other. In Texts, it is able to predict not only the observed Zipf's law of rank ordered word frequency but also the length distribution of words, (which obey the same distribution as Proteins and Computer Programs). Finally, it predicts the power-law behaviour of the distribution of Elements in the Universe (and also in sea-water).