Treating th
as a chunk, and its effect on string probability
Notation: D(.) is a functional which takes the difference between the log of its argument’s value at location 2 and the log of its argument’s value at location 1. Just as the usual interpretation of D x is x2 – x1, we will now use D f to mean log f(x2) – log f(x1).
State 1 is the original string; State 2 is the string when
we consider th
as a single symbol. I use the convention that when no confusion may ensue, the
variable that expresses the number of occurrences of a letter is represented by
the same symbol as that letter. Thus the variable t represents the number of ts
in a string. N1 is the number of symbols in state 1, and N2
is the number of symbols in State 2, and don’t forget that N2 = N1
– th.
Prob (State2) / Prob (State 1) ![]()



![]()




is roughly, but only roughly, the
mutual information between t and h in the second model. Why is it only roughly
that, since the expression looks just like the definition of mutual
information?