There is no obvious reason why diverse discrete
systems (e.g. computer software, proteins, music,
texts) should have any properties in common.
However, by constraining the simplest measure of
total information, CoHSI (Conservation of Hartley-
Shannon Information), in a statistical mechanics
framework, we show that this directly predicts at
all scales the self-similarity of their observed length
distributions and other previously unsuspected common
properties. This prediction is confirmed for each
of these discrete systems. We distinguish two
essential discrete system forms: heterogeneous in which
individual components are sequentially assembled
from an alphabet of unique tokens (e.g. amino
acids in proteins), and homogeneous systems in
which each component is built from a single token
unique to that component (e.g. word frequencies in
texts). Heterogeneous systems are characterised by
an implicit distribution of component lengths, with
sharp unimodal peak and power-law tail, whereas
homogeneous systems reduce naturally to Zipf’s Law.
We show that very long components are inevitable
for heterogeneous systems, and that some discrete
systems such as texts exhibit both heterogeneous and
homogeneous behaviour. In systems with more than
one consistent token alphabet (e.g. digital music), the
alphabets themselves show a power-law relationship. | This paper had been presented to several audiences with good feedback and had also been read by a number of our colleagues in great detail which improved our reasoning. My co-author and I believe it fundamental. As we ticked the box for open review, we feel we can share the reviewers' comments on this paper as a reflection of typical reviewing standards today.
The reviewers took over 3 months to produce the following. They are inconsistent, neither bothering to download the associated reproducibility suite which generates each result independently and they mentioned only the power-law missing the full CoHSI derivation completely (the really important bit) because they didn't read it properly and appear not to have read the Supplementary Materials containing the mathematics at all, (although the second reviewer tried a bit harder). In effect it is censorship by indifference.
Reviewer 1: The paper states a broad principle that diverse discrete systems share the same underlying conservation of Hartley-Shannon Information. The authors present examples from biology, music, computer programming, and literature to prove this point. While there is much empirical evidence being presented, I am not sure how these results should be taken. Does this conservation principle lead to meaningful consequences, and does this conservation principle arise from any meaningful physical mechanisms? There is only a modicum of analysis in the paper regarding the results. Furthermore, the paper is quite verbose. The paper could have been cut in half without losing much information content.
Reviewer 2: It is well known that many discrete systems have the common property described by the power law relation.
In my understand, this paper try to clarify the underrlying mechanism why diverse descrete systems can be described by the common power law.
By applying the conservation law of Hartley-Shannon Information, authors derive the power law from the Boltzmann distribution.
This is an interestiong work.
However, after that, authors demonstrate that the derived power law can describe diverse descrete systems using real data.
Unfortunately, I do not understand the significance of the demonstrations.
In my understand, the demonstrations only trace the well known fact that diverse descrete systems can be described by the power law relation.
In order to show significance of the proposed model, I believe that author should show the evidence of the conservation law of Hartley-Shannon Information or evidence of related assumptions of the fundamental framework of the proposed model.
When asked, the editors of ProcRoySoc A consider this to be acceptable reviewing. Notice the complete absence of any understanding of the scientific method. Very disappointing and we shan't bother again with this journal when reviewing standards have slipped so low and are defended editorially. It simply isn't worth the effort of doing the submission. We feel it important to say something because, if young researchers get treated like this too, then its shameful.
We left it in its submission livery (its quite pretty) because it took a lot of time to format it thus, but just to reiterate, it was unanimously rejected and we do not wish to imply otherwise.
| 11 |