It is rather an amazing fact that if one asks for a function that will capture how much "choice" there is in selection of an event (to use Claude Shannon's term) we arrive at an equation that is exactly the one called entropy in statistical mechanics i.e \( S= - k \sum_i p_i \ln p_i\) where k is some constant. So the obvious question then is that, is this a coincidence or something deep? We have spent a bit of time carefully go through the thermal physics and ultimately arriving at entropy and drawing conclusions from the principle of increasing entropy and all throughout was the ever present notion of a quasi-static process because we always wanted to stay in equilibrium in order that we may use state functions. This clearly has no direct relation with the entropy in thermal physics but perhaps thermal physics is the wrong place to look for the relation. Obviously, we are led to invoke some notions from statistical mechanics.
Suppose we have a system A with entropy \( S_1\) and add then we have another system B with entropy \( S_2\). If we consider the composite system, we have that the combined entropy will be \( S_1 + S_2\). Now concurrently, we can consider the number of states (however we define that)available to each system. Then we will have \( \Omega_1 \) states for system A and \( \Omega_2 \) states for system B. How then do we get the total number of states for the combined system? The answer is \( \Omega_1 \Omega_2 \). If we are to look for a relation between the two concepts it will have to obey
\begin{equation}
S(A) + S(B)= f( \Omega_A \Omega_B ) \hspace{10mm} \text{eq.1}
\end{equation}
where f is some undetermined function.
Let us for the moment switch gears to information theory. Say I have some random variables X and Y which can take on values \( x_i\) and \(y_i \) respectively. We now want some function ,I, that characterizes the amount of "choice" I have from a specific probability event. Clearly this function must satisfy some intuitive properties. I only state the property that is relevant for this discussion. If I have two different probabilistic events \( x_i\) and \(y_i \), then surely we want the property that the amount of choice of both these two events will be \( I(x_i) + I(y_i)\) and this will be the amount of information \( I(x_iy_i) \) for the observed outcome \( x_i y_i\). So we have that
\begin{equation}
I(x_i y_i)= I(x_i) + I(y_i) \hspace{10mm} \text{eq.2}
\end{equation}
.
It should be clear now why in both cases the desired function turns out to be natural log.
In both situations we are solving Cauchy's functional equation but can we go deeper? As far as I can tell the answer is no(so far). No one as far as I can tell has come up with a conceptual understanding that ties the two areas together. People use the word entropy in both situations, it is the same equation but this does not imply that there is the same concept at work. Notice that the formula for entropy looks like an average where the function being averaged over is the natural log. So we are calculating the average information over some probabilistic distribution but here comes the crucial point; who said it had to be the Boltzmann distribution or any distribution relevant for physics? The differential equation for the simple harmonic oscillator appears in very disparate situations but then no one argues for a connection between the situations.
There is a paper by E.T Jaynes called \( \textit{Information Theory and Statistical Mechanics} \) claiming to make the connection. I am not sure I buy it since it merely takes advantage of the fact that the equation is the same in both cases and goes on from there. In fact he goes on the derive the Boltzmann distribution doing exactly the same kind of calculation done in Statistical mechanics the difference is how he interprets the calculation. He interprets it in such a way that one never needs to consider any physical assumption! To quote part of the abstract:
" It is concluded that statistical mechanics need not be considered as a physical theory dependent for its validity on the truth of additional assumptions not contained in the laws of mechanics (such as ergodicity, metric transitivity, equal a priori probabilities etc)''
In other words if experiments falsified his predictions(which they don't), his interpretation would still stay where it is namely, this is provably what you get when you consider statistical inference. It just so happens that this statistical inference (containing no physical assumptions or necessities) corresponds to reality. So if he takes out the physics from statistical mechanics why should we conclude that he has found a connection between information theory and physics?
Clearly there is something deep happening but again no one has gone further than commenting on the fact that the same equation appears in both fields. As a result we can draw no conclusion.
Suppose we have a system A with entropy \( S_1\) and add then we have another system B with entropy \( S_2\). If we consider the composite system, we have that the combined entropy will be \( S_1 + S_2\). Now concurrently, we can consider the number of states (however we define that)available to each system. Then we will have \( \Omega_1 \) states for system A and \( \Omega_2 \) states for system B. How then do we get the total number of states for the combined system? The answer is \( \Omega_1 \Omega_2 \). If we are to look for a relation between the two concepts it will have to obey
\begin{equation}
S(A) + S(B)= f( \Omega_A \Omega_B ) \hspace{10mm} \text{eq.1}
\end{equation}
where f is some undetermined function.
Let us for the moment switch gears to information theory. Say I have some random variables X and Y which can take on values \( x_i\) and \(y_i \) respectively. We now want some function ,I, that characterizes the amount of "choice" I have from a specific probability event. Clearly this function must satisfy some intuitive properties. I only state the property that is relevant for this discussion. If I have two different probabilistic events \( x_i\) and \(y_i \), then surely we want the property that the amount of choice of both these two events will be \( I(x_i) + I(y_i)\) and this will be the amount of information \( I(x_iy_i) \) for the observed outcome \( x_i y_i\). So we have that
\begin{equation}
I(x_i y_i)= I(x_i) + I(y_i) \hspace{10mm} \text{eq.2}
\end{equation}
.
It should be clear now why in both cases the desired function turns out to be natural log.
In both situations we are solving Cauchy's functional equation but can we go deeper? As far as I can tell the answer is no(so far). No one as far as I can tell has come up with a conceptual understanding that ties the two areas together. People use the word entropy in both situations, it is the same equation but this does not imply that there is the same concept at work. Notice that the formula for entropy looks like an average where the function being averaged over is the natural log. So we are calculating the average information over some probabilistic distribution but here comes the crucial point; who said it had to be the Boltzmann distribution or any distribution relevant for physics? The differential equation for the simple harmonic oscillator appears in very disparate situations but then no one argues for a connection between the situations.
There is a paper by E.T Jaynes called \( \textit{Information Theory and Statistical Mechanics} \) claiming to make the connection. I am not sure I buy it since it merely takes advantage of the fact that the equation is the same in both cases and goes on from there. In fact he goes on the derive the Boltzmann distribution doing exactly the same kind of calculation done in Statistical mechanics the difference is how he interprets the calculation. He interprets it in such a way that one never needs to consider any physical assumption! To quote part of the abstract:
" It is concluded that statistical mechanics need not be considered as a physical theory dependent for its validity on the truth of additional assumptions not contained in the laws of mechanics (such as ergodicity, metric transitivity, equal a priori probabilities etc)''
In other words if experiments falsified his predictions(which they don't), his interpretation would still stay where it is namely, this is provably what you get when you consider statistical inference. It just so happens that this statistical inference (containing no physical assumptions or necessities) corresponds to reality. So if he takes out the physics from statistical mechanics why should we conclude that he has found a connection between information theory and physics?
Clearly there is something deep happening but again no one has gone further than commenting on the fact that the same equation appears in both fields. As a result we can draw no conclusion.