I received the following tweet yesterday from @ProbFact and decided to check it out in more detail:
I generated the following test to investigate the claim: Create four category discrete distributions where two of the categories have 0.25 probability each, and the third category probability varies between 0.1 and 0.4. The fourth category’s probability equals 0.5 minus the third probability, to form a total of one across all four probabilities. I then computed the Shannon entropy of the distributions formed from these probabilities using the equation:
I stored the value of H(X) for each value of the third probability, and then plotted H(X) as a function of the third probability. The entropy in the plot maximized when all four probabilities equal 0.25.
I then created another set of four category distributions where the first and third probabilities vary between 0.1 and 0.4 and the second and fourth probabilities equal 0.5 minus the first and third probabilities, respectively. Again I computed the Shannon entropy of the distributions formed by these probabilities. I then plotted the computed entropies on the following contour plot, which shows that entropy is maximized when all four probabilities equal 0.25:
import numpy as np import matplotlib.pyplot as plt # # two dimensions # px1 = 0.25 px2 = 0.25 px3 = np.arange(0.1, 0.4, 0.01) px4 = 1 - px1 - px2 - px3 H = -1 * (px1*np.log2(px1) + px2*np.log2(px2) + px3*np.log2(px3) + px4*np.log2(px4)) plt.plot(px3, H) plt.ylabel('Shannon Entropy (bits)') plt.xlabel('P(x3)') plt.title('Entropy for Distribution of Four Categories') plt.show() # # three dimensions # px1 = np.arange(0.1, 0.4, 0.01) px2 = 0.5 - px1 px3 = np.arange(0.1, 0.4, 0.01) px4 = 0.5 - px3 entropy_list =  for i in range(len(px1)): entropy_sub_list =  for j in range(len(px3)): H = -1 * (px1[i]*np.log2(px1[i]) + px2[i]*np.log2(px2[i]) + px3[j]*np.log2(px3[j]) + px4[j]*np.log2(px4[j])) entropy_sub_list.append(H) entropy_list.append(entropy_sub_list) X, Y = np.meshgrid(px1, px3) Z = np.array(np.matrix(entropy_list)) V = np.arange(1.7, 2, 0.01) plt.contour(X, Y, Z.transpose(), V) plt.ylabel('P(x3)') plt.xlabel('P(x1)') plt.title('Entropy for Distribution of Four Categories') plt.show()