Treating th  as a chunk, and its effect on string probability

 

 

Notation: D(.) is a functional which takes the difference between the log of its argument’s value at location 2 and the log of its argument’s value at location 1. Just as the usual interpretation of D x is x2 – x1, we will now use D f to mean log f(x2) – log f(x1).

 

 

State 1 is the original string; State 2 is the string when we consider th as a single symbol. I use the convention that when no confusion may ensue, the variable that expresses the number of occurrences of a letter is represented by the same symbol as that letter. Thus the variable t represents the number of ts in a string. N1 is the number of symbols in state 1, and N2 is the number of symbols in State 2, and don’t forget that N2 = N1 th.

 

Prob (State2) / Prob (State 1)

 

 

 

 

 

 

 

is roughly, but only roughly, the mutual information between t and h in the second model. Why is it only roughly that, since the expression looks just like the definition of mutual information?